Asset Metadata Cataloguing – Why You Can’t Automate It And Expect To Get ROI From DAM

July 31, 2012 By Naresh Sarwan in Taxonomy And Metadata

Edward Smith, of Extensis has written an article for CMSWire.com entitled: Who Should Enter Metadata in Digital Asset Management?

“Metadata is important because it helps you and other DAM users find the right files (keywords), understand the content of the files you find (descriptions) and use those files correctly (rights management). That sounds great and all, but who is going to enter all that metadata? The short answer is: hopefully someone else! If you can use metadata that somebody (or something) else provides, you can save some time and headache.” [Read More]

Edward lists various other alternatives to doing it yourself:

Use data from the file system (creation/modification date etc)
The DAM system you use itself
Asset suppliers (e.g. photographers you have hired)
Device metadata (for example EXIF data)
Delegating it to your staff (or ‘colleagues’ as Edward tactfully puts it)
Applications like Photoshop and Lightroom

The article isn’t bad, but I do think this is skirting round the big issue with DAM systems which no one has really solved yet, which is the whole ‘Garbage In – Garbage Out’ dilemma of DAM. In order for people to want to use a DAM system, they need to be able to find assets within it. In your typical corporate DAM, no one wants to do the cataloguing work because it is often painfully dull, gets in the way of other (arguably more important) tasks and the whole reason many users invest into DAMs is to have the system do a lot of this work for them.

It is true that some of the basics can be useful (especially creation dates and MIME types/file extensions). However, a lot of the machine oriented methods described just don’t cut it (in my opinion) and are potentially risky if not checked carefully (which reduces their benefit and increases the cost). While it might be ‘free’ metadata, most EXIF data ends up being noise that interferes with search results, for example, how often are your users really likely to want know what camera model or the exposure time was for an image? For photographers this might be useful, but for more conventional users it may be less so. I would concede that some geo-location metadata might have potential value, but this needs to be closely integrated into the DAM and not presented as something abstract like a set of coordinates.

Even with more subject-oriented metadata like XMP, I’ve found some hair-raising scenarios. For example, one client I worked with had got their agency to supply Illustrator files of their pack shots. The XMP data contained Illustrator layer information from a competitor product that the agency worked on (without the client being aware of that fact) and the designer had used the original files without removing some of the invisible layers. The competitor product name appeared in metadata right next to their own. Fortunately, this was all internal stuff and you could argue that the XMP data and the DAM system had revealed that fact which might not have been known about anyway, but there are easier ways to glean this sort of supplier intelligence and it’s certainly not one you want reported as a software ‘bug’ by end users.

Delegating asset cataloguing to staff can produce some peculiar results also. With the corporate clients I have worked with, the most common problem is that the staff will either over-use the batch automation capabilities of the software, so the same description and keywords will be applied completely inappropriately to assets that don’t resemble the description at all. The other issue is a training one where staff will tag using keywords that describe what something is in specialist terms (e.g. product/service names) but have no literal indicators of what the asset is (this is an issue for images especially).

To do DAM metadata entry right, some of the free/externalised methods can help reduce the work a bit, but it’s slightly disingenuous of Edward to claim you can fully push this on to other people. That’s not entirely surprising as he works for a DAM vendor and this ‘elephant in the room’ about buying into DAM is one that vendors (both good and the bad) don’t like to talk about as it can put people off making a purchasing decision if they know that as well as buying the product and dealing with the roll-out, they’re also likely to pick up a whole pile of unexpected cataloguing work also.

To do metadata entry properly requires a combination of both a literal description of what an asset is (i.e. what you can see) and any business/subject specific terminology. This implies either one person with subject knowledge and picture research skills (e.g. a ‘digital asset manager’) or if that isn’t available, a two-pass approach where the assets are catalogued by staff first and then worked on by expert keyworders who understand about cataloguing assets in a way that allows end users to be able to find them also.

The suggestion that this can be de-humanised and automated using ‘free’ metadata that systems will magic up for you is, in my view, total nonsense. If you want to get decent ROI from your digital assets, factor in some costs for someone to catalogue the material properly (whether in-house or externally). If you do not, your DAM initiative risks becoming just a waste of time and money as no one will be able to find anything, which is probably the key reason you thought about getting one to begin with.

Share this Article:

Related Posts:

Taxonomy And Metadata

4 Comments

David Diamond

July 31, 2012 at 2:41 pm

Bravo Naresh! I wholehearted agree with your points. People think I’m crazy, but this is one of the reasons I encourage people *not* to use mandatory fields. If users don’t know proper metadata values, there’s no reason to encourage them to enter garbage just so they can save a record.

I would much rather have a “Metadata Incomplete” flag that a user can set in order to remind that user, that user’s manager and all users of the system, that the record’s metadata should be considered “beta.” Automation then is a perfect way of finding those incomplete records and reminding subject matter experts (SME) that action is needed.

Then, there should be a field accessible only to the SME that acts like a digital signature to verify the record’s value. I think this combination of “needs work/verified” fields go a long way toward improving the quality of metadata and, in the meantime, letting users know when what they’re looking at might not be accurate of complete. In other words, building trust in the system.

Bad metadata looks a lot like good metadata, so before we start blindly believing and basing business decisions on everything a DAM system can automatically extract, we need a better system for checks and balances.
Ralph Windsor

July 31, 2012 at 4:15 pm

Naresh,

I’ve got to agree with both you and David on this issue. The only people who seem to go crazy over automated metadata are either techies who haven’t ever had to deal with the consequences of it or the uninformed who are hopeful that there might be some way to avoid the work involved.

I guess some of that semantic web stuff you cover might eventually help out here, but I think people have to come to terms with the fact that metadata cataloguing is like any other kind of literature/writing task – you can’t get machines or monkeys to do it and get serviceable results. I can’t see that changing for quite a while (probably not in my lifetime, in fact). The most effective techniques are to make entering metadata incrementally less painful and to simplify the review process so it’s easier to see what is being put in.

On David’s point, one tactic I’ve seen used is to allow users to save without mandatory fields being completed, but you can’t release it for others to look at until that’s done (an archived or not published flag). I guess it’s a horses for courses decision about making fields compulsory or not and whether you assign a “for review” flag (etc) but allow anyone to see it, or let them save but not release the asset. If you’re in something like financial services or government etc then I can see a few communications managers being less keen on that (for regulatory/legal reasons) even if the record is clearly marked as being unverified as they’ve effectively released something even if it’s with a metadata health warning.

Stratifying your assets into different collections based on the expected usage scenario is another way to handle it. I’ve seen this in press media libraries where stuff that needs to be released right away has minimal cataloguing (as it won’t be needed for very long post-release and the end users will be watching out for it anyway) whereas longer-term assets get more detail applied because the library knows end users will be searching for it potentially years afterwards.

I think the main point here is you’ve got to think about metadata cataloguing as being of equal importance as the digital media file itself. An asset is the file + metadata, if you skimp on the latter the asset value is diminished as a result.
Martin Wilson

August 1, 2012 at 2:30 pm

As mentioned above, for a DAM system to be effective each asset needs metadata relevant to the terms users might enter when they try to find it. Until there are big leaps forward in image recognition and/or artificial intelligence metadata of this quality will be generated predominantly by humans. Decent DAM systems can at best make it easier for humans to do this.

In our experience almost all organisations trust a few key people to enter metadata and to approve new assets. While this approach has advantages (for example quality control) it does mean that these few people can spend a lot of time on these activities. I agree with Naresh that to simply ‘push this on to other people’ is unlikely to be effective. But what about making better use of the users who actually need to find the assets, for example allowing them to correct or enhance an asset’s metadata if they see fit? Obviously if they can’t find an asset in the first place then they can’t help, but plenty of ‘end-users’ have a deep understanding of an asset’s subject domain and would be keen to improve its metadata, especially if it were easy to do and they are motivated to do so (which is where gamification could play an interesting part).

Wikipedia shows how effective user-generated content can be – I would like to see these principles better applied to DAM, especially in a large organisation with hundreds of thousands of potential users. One problem is that this requires more than just software to do it – it also needs an enlightened corporation willing to trust their users!
Naresh Sarwan

August 1, 2012 at 4:57 pm

I’m still not sure everyone has fully followed what I’m saying here. It’s understandable that technology based solutions are discussed as those contributing either have a software or consulting background, but I don’t think that is the core problem.

Entering cataloguing metadata for a B2B DAM where the subject matter is usually fairly dry tends to be quite a dull job. Either users won’t do it, or it becomes a ‘trainspotter’ task – which is why you end up with a lot of highly subject oriented keywords, but nothing that describes what a non-expert can see.

One of the clients I am working with right now are the container shipping division of a big logistics provider. They have a subject expert who has gone through and entered all kinds of detail about each of the vessels in the images like their tonnage, size, ports served etc. If you do a search for ‘container ship unloading’, however, nothing is found, even though they have about 2000 photos that should match! The problem isn’t the DAM system – it’s the cataloguing doesn’t describe what you can see in simple terms that anyone can understand (and might use as search keywords). This has become a big problem for their marketing people when they want general images that don’t meet a very tight range of criteria as they can’t find anything suitable without a lot of effort.

The opposite (and less common) scenario is where a team of independent picture researchers have been hired and they’ve done a great job of describing each image, but there is no technical detail about the each of the objects/people because they haven’t been briefed properly or had their work checked. So a picture of the CEO speaking at the AGM will be will be something like ‘middle aged man speaking from a podium at a conference room full of people’. That doesn’t tell anyone who it is, where the event is etc and is a problem for corporate comms managers because they can’t locate specific photos for news/press purposes.

For effective cataloguing, you need both the detail and the general concepts covered. It’s a problem that’s a lot more complicated than the traditional stock photography keywording tasks where many of the core ideas about DAM were originally formed (and that might have something to do with it).

Asking end users to contribute and improve the metadata might be effective. But, lets get real here for a minute, how often will busy staff really devote time to doing this on the kind of scale needed? Wikipedia has the benefit of people who are really into their subject, that’s far less likely for a corporate DAM. Wikipedia also has millions of users, whereas most DAM systems have a core group of people who actually submit content and enter metadata (10-20 would probably be typical). If they are keen on the subject, it tends to be these technical micro-details that I just described, which sort of helps, but probably not most users with more general needs. Lastly, some stuff on Wikipedia is not very useful at all and contains factual errors amongst other issues that actually tend to be less of a problem on DAM systems, ‘vandalism’ for example.

I think Ralph’s example of the press photo library and separating assets out into groups and applying metadata accordingly might be a potentially efficient way to handle this, especially for larger libraries where it’s just impractical to catalogue everything in satisfactory detail. But even that involves someone making critical decisions about which segment an assert goes into and the risk of something vitally important getting added to the ‘slush pile’ because the cataloguer doesn’t understand the potential value of the asset.

The real issue though is metadata education. It’s my strong belief that ‘metadata literacy’ is going to become as important as general IT literacy in forthcoming years. I think we’ll only start to see a change when end users make the causal connection between poor cataloguing and not being able to find anything in their own personal lives (as happened with IT).

I think that has to be the way forward and the job of DAM vendors should be about educating their customers rather than pretending it’s a non-issue or a problem that can be delegated to someone/something else.

Asset Metadata Cataloguing – Why You Can’t Automate It And Expect To Get ROI From DAM

4 Comments

Leave a Reply