Deaccession Policies: Practical Methods For Dealing With Data Hoarding Dilemmas
Writing on his Tame Your Assets blog, Ian Matzen published an article last week: Are You a Digital Asset Hoarder or a Digital Curator? The thesis of the article is that defaulting to a ‘never delete’ policy for managing your digital asset collection is not an optimal strategy because of the volume of low-value material that get retained as a result. Ian advocates having a collections management policy to help guide retention/deletion practices:
“I am arguing for smart asset retention. If you can’t explain why, for example, you are keeping a gazillion images of your CEO in silly costumes, you have a problem. Or why those assets sitting on Pat’s computer desktop, yes, the ones downloaded from Google without rights metadata, have been uploaded into the DAMS, you’ve got some explaining to do. For archivists and librarians alike, collection management is a familiar concept and should be at forefront of every digital asset manager’s mind, whether we “weed” or retain all content.” [Read More]
I think Ian has some fair points over this issue which I generally concur with. Several factors, however, may conspire to make this a more complex policy to implement than it first appears, although there are also some methods which can be used to help make rational management decisions about retention policies for DAM that I will discuss later.
To start with, I will make it clear that if I had a binary choice between deleting data and keeping it, I am one of those people who usually sides with retaining it. I come from the school of IT that says you can never have too many backups. While managing large scale data repositories can be a demanding task, bitter experience has taught me that needing to have engineers go through all kinds of obscure recovery processes to try to retrieve deleted data is nearly always a worse problem to deal with (not least because of the regret which accompanies it). This is a more straightforward technical issue though. The problem Ian discusses is more of a qualitative one, i.e. deciding what data is worth holding on to (or not, as the case may be). Later in the article, he offers this advice:
“For collection development to succeed there needs to be a policy, also known as a retention policy, drawn up by your governance committee and then ratified by stakeholders. It should define, as specifically as possible, the asset types that get stored and managed, how long they are kept, and what gets tossed, such as duplicate, ephemeral, or low value assets.” [Read More]
This sounds reasonable, however, thrashing out the finer details of how this will work in reality is often easier to say than it is to actually implement. In the case of digital assets, knowing (in advance) what is worth retaining is not without risks. To take a couple of Ian’s examples, the multiple photos of the images of your CEO in silly costumes do seem like reasonable candidates to remove, but one of them might include material in the background which later acquires greater importance (e.g. to prove or disprove that a person or object was present at the time). In Peter Krogh’s book about DAM, there is a photo of New York taken prior to September 2001 which shows an American flag with the Twin Towers behind it. Immediately post-origination, the image might not have been considered particularly significant. As noted by Peter, however, context (i.e. major geopolitical historical events) has considerably increased the power and value of the image in a way that it would have been very difficult to predict. In the case of duplicate assets, while the binary essence might be identical, the audit/usage and workflow records may well be different (and contain their own insight which might not be apparent initially). To delete one of them, a judgement has to be made about which is more valuable than the other (and the likely outcome is that most users will just randomly choose). It would be fair to say that most DAM users underestimate how much metadata any given digital asset has, even one which has had no cataloguing data applied.
At this point, those who have tendencies towards deletion rather than retention are likely to counter with the observation that you can make a case for retaining anything. They are more right about that than they would imagine. As most Digital Asset Managers are well aware, the whole repository is full of ‘special cases’. The reason you need to manage digital assets in the first place is that they are all unique (even the ones that appear to be perfect copies). The value of a digital asset to a user fluctuates depending on scarcity and context, as such, the business impact of getting a valuation wrong might be negligible, or it could be huge and have a long-lasting impact, you just don’t know with complete certainty either way.
There is a wider digital transformation perspective to this which is not one I regularly hear mentioned in discussions about the subject. Most of the organisations who are profiting the most from digital technology currently regard data as an asset to be accumulated. Simply acquiring it is obviously not sufficient by itself and further metadata needs to be applied to add sufficient value to make it useful, but if you lack the raw materials to start with then the opportunity is lost. Firms that are data-rich are now increasingly likely to be cash-rich. For that reason, I advocate not removing any data you own if you can avoid it. Data represents stored activity; it cost someone time and money to generate, so by disposing of it, you write down the value of that investment to zero and without any possibility that you can profit from it at some future point.
The default position should be to never delete data, unless it meets one of the following strict criteria:
- The IPR is owned by someone else who did not give permission for your organisation to use it (i.e. it breaches copyright and is effectively stolen).
- You have exceeded all available storage capacity and fully exhausted any means to extend it (for financial or other reasons)
- There are either no funds to allow the data to be retained in off-line capacity or the data is sensitive and cannot be held outside the organisation for a significant reason (for example legal compliance).
While the above are reasonable principles, I have to acknowledge Ian’s guidance to devise a policy for assessing digital asset value to mitigate against un-managed data hoarding. Just not deleting anything is impractical for most organisations, not least because while storage is cheap (as described in Ian’s article) it is not infinite, nor free. Even if the availability of capacity is not yet an issue, you will still have stacks of digital assets which might appear to be intrinsically worthless for an indefinite period interfering with searches for more valuable material and generally making getting any work done far harder than it would be if some kind of purge was enacted.
The best method I have come across to hedge the risk of losing data which might become valuable later against the operational challenges of retaining what are currently perceived to be low value digital assets is to use multiple catalogues, in other words to segment them. Earlier in this article, I used the description ‘binary choice’ to refer to data deletion. Ideally, digital asset repositories should make it possible to avoid this becoming a one-time, do or die decision about whether digital assets are kept or not. Rather than a hard deletion, instead consider a deaccession policy which uses a levels-oriented approach (corresponding to catalogues based on current value) which assets move through and business-specific criteria or rules that determine when that happens. Lower value items which are strong candidates to be removed entirely exist in catalogues which are commensurate with their current role and users either have to ask to search them or possibly even need to request access. Should storage capacity becomes an issue, the lowest value collection is where candidates for removal are sought first. This avoids having to make arbitrary decisions making about what to retain. If offline storage is available (and options like Amazon Glacier and various similar competitors make that a cheaper and more tenable proposition than it once was) then it might be possible to keep the metadata and possibly some smaller proxies (previews) so there is some memory that the material still exists.
At present, while these sort of facilities are offered in DAM solutions (especially transmission to offline storage) they tend not to be as flexibly implemented as they will need to be in the future as the volume of digital assets in circulation continues to increase. I would imagine most DAM users have a system which offers an option for changing the status of an asset to ‘archived’ or ‘admin only, ‘not published’ etc. The terminology varies, but the common factor is that the assets are not deleted, but are not available to general end users either. These are a starting point, but they are unlikely to offer a satisfactory level of descriptive precision in the future nor opportunity to segment data as flexibly as will become necessary. To an extent, this is also possible using asset/user permission controls (i.e. by putting assets into groups and then controlling access to those) but even then, the results may still end up finding their way into searches alongside higher value assets for anyone with access. I have seen a few solutions that do provide multiple catalogues of the type that would suitable, however, they tend to be more preservation use-cases (e.g. Collection Management Systems as used by museums etc). As often happens in the DAM software market, I can see more of these facilities getting replicated in business or marketing-oriented DAMs as their users begin to better appreciate the necessity for them.
I should make it clear that while this approach offers some advantages, it too has drawbacks. For example, while decisions about the value of digital assets are likely to be less fraught and risky than fully deleting them, they will still provoke debate and once an asset has been downgraded, the likelihood of it being reinstated will probably diminish. Similarly, if users don’t have default access to assets which would have otherwise been deleted, then as far as some will be concerned, they might as well have been. Lastly, although these features are relatively straightforward, they do introduce further complexity to the software and the risks of their implementation (and costs) need to be managed, so this method is not a free lunch, by any estimation.
With all that said, I believe the approach described provides the basis of a model which could be used for a more versatile deaccession policy framework. This is going to become an increasingly demanding issue for organisations to address. While there will be a strong instinct to treat removal of data as though it was some kind of exercise akin to cleaning out your loft, in a digital environment where (for the most part) data will remain in the same pristine condition it was in when introduced, you need to think carefully about how to preserve its value without allowing your DAM initiative to turn into a data hoarder’s charter.Share this Article:
The archival profession has a rich set of literature about deaccessioning. Don’t be afraid of it to manage your collections.
The archival profession might have a lot of literature about deaccessioning, however, most DAM users who make management decisions for the repositories for commercial organisations tend not to be aware of it (and that is currently the largest group of DAM users by both revenue spent on DAM solutions and quantity of users). I take your point, however, and there is an opportunity to learn from these sources for those willing to put in the time and effort to carry out the research.