Legacy Metadata – The Next Big DAM Problem?

Like a lot of people I know in the Digital Asset Management trade currently, my firm has been deluged with client work for a sustained period.  Several times in recent weeks, I’ve been asked why I think that is.  Some vendors advance the theory that it’s because their solutions have become more sophisticated and demand has increased as a result.  A few consultants take the view that their educational efforts are bearing fruit (and not before time some of the older hands are keen to point out).

There might be some elements of truth in both those points, but in my mind, the reason is far simpler and it’s just that there is far more digital content (‘assets’) getting generated than there used to be.  If you capture an image or audio/video these days, it’s almost always going to be using a digital device and then transferred to a digital storage medium; the same applies if you are originating other materials like artwork, documents etc.  The metaphor I have used is that when it snows a lot, people invest in shovels and snow ploughs.  That seems like the simplest (and therefore plausible) reason why DAM is now in demand and why it will remain firm for some time.  Whether the same group of protagonists will still feature in the unfolding plot, is a different question, but the underlying theme of this story is now very well defined.

A point we have been at pains to make clear on DAM News (and one I also emphasise to those organisations I work with) is that assets are not just composed of the digital files you download, but the metadata also and it’s the latter which makes them useful.  The reason our subject is called ‘Digital Asset Management’ rather than ‘Digital File Management’ is because the term ‘asset’ implies that you add value to the binary essence.  Even a basic file is a very simple form of metadata designed to tell someone (or something) what it contains so they can get a clue about what they can and cannot do with it.  The wider the scope of metadata you can potentially associate with an asset, the more opportunities there are for extracting value out of it.

The obvious example is search and most DAM users quickly grasp the point that having relevant metadata helps them find assets more easily.  It would not be accurate to say that metadata is solely about search, however, there are numerous other types of metadata which all offer potential value.  For example, the ordering/usage patterns offer business intelligence that can extend beyond the asset’s original purpose.  Automated cataloguing might become more of a tenable proposition than it has hitherto seemed if you can associate workflow data and linked metadata from elsewhere to help derive cataloguing suggestions.  The more you think about what to do with digital assets, the more metadata gets created.  Re-using my earlier metaphor, it’s possible to envisage digital assets acquiring a snowball like effect where metadata accrues and subsequently offers an opportunity for deeper insight to be extracted as a result, thus expanding the scope of what organisations can do with their DAM repositories to areas they might not have considered before.  Opportunities and problems, however, are two sides of the same coin.  Having all this metadata is great, but it needs to be managed to prevent it becoming overwhelming.

Some of the projects I am involved in currently are DAM refreshes where the client are now on a third or fourth round of major re-implementation.  In one case it is because the first two attempts were unsuccessful and it wasn’t until around 2006 that they finalised a system that met the needs of the target users.  Another is preservation-oriented and the client has been using one application or another since the early 1990s, with a few major iterations in the intervening period.  In both cases, a complex issue is what to do with the legacy metadata.   You can make an unequivocal case for the value of some elements, for example, captions to images, copyright information, dates of origination etc.  Where it gets more complex is transactional data relating to who downloaded what and when (and the purpose it was used for).  Adding those elements into the migration scope adds cost and complexity.  Furthermore, the method of auditing in one system is different to the technique employed for the old one, so there are issues like mapping decisions with both technical and political consequences which might delay launch and therefore have their own impact on ROI.

There is a strong motivation to just dump the data and not bother with it, but to do so could significantly reduce the overall long-term value of the assets contained within the DAM.  Certainly, the basics from the old system will get transferred to partially retain value.  Dispensing with metadata that took many person-years to enter or generate. however, is effectively like giving away jars full of old coins, some of which might contain pieces which could be highly valuable, but you won’t know that until you have tipped them out on to the kitchen table and commenced the unenviable task of picking through and inspecting each of them.

I would imagine this issue is currently more significant in preservation than more commercially focussed examples like marketing, but the former has more of an influence on the latter than is generally acknowledged (in respect of Digital Asset Management techniques, at least).  While many marketing managers will want to just get on with rolling out their new DAM solutions in short order, there is an increasing understanding by marketing personnel of subjects like big data, analytics and Business Intelligence.  These areas directly relate to digital assets since they form the base components of a marketing campaigns.  Being able to understand from a historical perspective whether or not some assets were associated with successful marketing campaigns is clearly valuable information; put in that context, disposing of legacy metadata from an outgoing system seems a hastier decision than it first appeared.  Similarly, losing the opportunity to more rapidly catalogue assets because the old workflow audit trails are now in the trash could be a costly error.  I know from personal experience that introducing these topics to a project meeting is not always well received, but most marketing people grasp that they need to think about them carefully before making decisions.

A half-way house proposed by some (especially those at the sharp-end of designing, writing and testing migration scripts) is that this data can be ‘parked’ somewhere and extracted into a neutral transfer format like CSV or XML etc but not actually imported.  That seems like a reasonable suggestion, but in my experience, if you don’t devise plans to migrate right away, from the off, then it may as well be trashed since those who have any memory of its existence will eventually leave without the knowledge of its existence getting passed on to their successors.  If management pressure to get a system released is great enough, however, this may often be the path of least resistance that gets taken.  As anyone who has dealt with legacy metadata will be painfully aware, re-awakening this kind of material once it has been in suspended animation for a few years is that much harder (and expensive) because the context and understanding of the data has been lost, so this strategy is expedient, but probably not the cheapest option even though it might cross over a budget period to sustain that impression.

My medium-term prediction is that the accession process for DAM will incrementally become far more automated than it is now as vendors (of both hardware and software) understand where new assets are coming from and the destination they will end up.  Also, further work will get done to make full ingestion more streamlined and efficient as it is currently the big cost of Digital Asset Management (and everyone understands that now).  While that problem might slowly become less acute than it is now, legacy metadata and what to do with it will ramp up in importance and acquire a far more important strategic role than it has done previously.  As more organisations move from their first or second DAM iteration into later editions, this issue will appear more often and those on the supply side will get asked about what tools and techniques they have to deal with it efficiently.

Share this Article:


  • I agree with what you say. In the work we do, handling legacy metadata often forms a large proportion of the project, and it is important to make firm decisions at the point of implementation as I agree that ‘parked’ data will remain for ever thus. We take the approach that the data ingested should be clean (free of hidden characters, line returns and other trip-ups) but that the wider data content enrichment/cleansing is best done in the new DAM. We also look at search and retrieval methods to prioritise newer material (if appropriate) where the search returns are better.

  • you are so on target. As a vendor that has been around for more than 20 years we have had to to migration from several different systems and we have been asked to export metadata as well as assets for some customers that have sadly switched systems for one reason or another. We have a publishing background and newspaper librarians would never agree to a new DAM system that did not preserve their years of hard and mostly manual labor. Migration can be tricky and slow. Mapping documents are prepared, samples are received and formats converted, then data is staged for review. Only after review is the larger body of assets and metadata ingested into the production system. What a tragedy it would be if years of metadata labor was lost or had to be manually re-keyed.

  • This will be the next big challenge for all digital information. You have to walk a fine line between saving too little and having too much. The snowball metaphor is apt; too much or too little can defeat the purpose. I think a good solution is to regularly review the metadata that is used and to add/weed accordingly. Waiting to edit/update this information leads to increased expenses. For organizations that have large data repositories, a dedicated DAM team is necessary. Users should be consulted regularly. Haste makes waste; metadata should be monitored routinely and changed incrementally to prevent rushed, sloppy updates.

  • The Issue of DAM does lie in quantity of material and the quantity of channels it occupies. I do believe that sometime in the future growth of devices will be less rapid, but by no means with the digital material slow down. I think an answer lies in employing a strong work force to manage the data, continuously. It is not a re-scope of current DAM practices by an expert, but a constant working of metadata and formats. I think makes a great point that if it is not done in the present it will not be done in the future. DAM is not like a typical American’s spending habit- spend it all at once, but rather should be about investing among a diverse set of financial items. DAM will most likely never catch up to account for all digital materials, but we won’t stand a chance unless we take the time and invest the man power to work at actually managing assets.

Leave a Reply

Your email address will not be published. Required fields are marked *