You are here: Home » Taxonomy And Metadata » Asset Metadata Cataloguing – Why You Can’t Automate It And Expect To Get ROI From DAM

{ 4 comments… read them below or add one }

David Diamond July 31, 2012 at 2:41 pm

Bravo Naresh! I wholehearted agree with your points. People think I’m crazy, but this is one of the reasons I encourage people *not* to use mandatory fields. If users don’t know proper metadata values, there’s no reason to encourage them to enter garbage just so they can save a record.

I would much rather have a “Metadata Incomplete” flag that a user can set in order to remind that user, that user’s manager and all users of the system, that the record’s metadata should be considered “beta.” Automation then is a perfect way of finding those incomplete records and reminding subject matter experts (SME) that action is needed.

Then, there should be a field accessible only to the SME that acts like a digital signature to verify the record’s value. I think this combination of “needs work/verified” fields go a long way toward improving the quality of metadata and, in the meantime, letting users know when what they’re looking at might not be accurate of complete. In other words, building trust in the system.

Bad metadata looks a lot like good metadata, so before we start blindly believing and basing business decisions on everything a DAM system can automatically extract, we need a better system for checks and balances.

Ralph Windsor July 31, 2012 at 4:15 pm


I’ve got to agree with both you and David on this issue. The only people who seem to go crazy over automated metadata are either techies who haven’t ever had to deal with the consequences of it or the uninformed who are hopeful that there might be some way to avoid the work involved.

I guess some of that semantic web stuff you cover might eventually help out here, but I think people have to come to terms with the fact that metadata cataloguing is like any other kind of literature/writing task – you can’t get machines or monkeys to do it and get serviceable results. I can’t see that changing for quite a while (probably not in my lifetime, in fact). The most effective techniques are to make entering metadata incrementally less painful and to simplify the review process so it’s easier to see what is being put in.

On David’s point, one tactic I’ve seen used is to allow users to save without mandatory fields being completed, but you can’t release it for others to look at until that’s done (an archived or not published flag). I guess it’s a horses for courses decision about making fields compulsory or not and whether you assign a “for review” flag (etc) but allow anyone to see it, or let them save but not release the asset. If you’re in something like financial services or government etc then I can see a few communications managers being less keen on that (for regulatory/legal reasons) even if the record is clearly marked as being unverified as they’ve effectively released something even if it’s with a metadata health warning.

Stratifying your assets into different collections based on the expected usage scenario is another way to handle it. I’ve seen this in press media libraries where stuff that needs to be released right away has minimal cataloguing (as it won’t be needed for very long post-release and the end users will be watching out for it anyway) whereas longer-term assets get more detail applied because the library knows end users will be searching for it potentially years afterwards.

I think the main point here is you’ve got to think about metadata cataloguing as being of equal importance as the digital media file itself. An asset is the file + metadata, if you skimp on the latter the asset value is diminished as a result.

Martin Wilson August 1, 2012 at 2:30 pm

As mentioned above, for a DAM system to be effective each asset needs metadata relevant to the terms users might enter when they try to find it. Until there are big leaps forward in image recognition and/or artificial intelligence metadata of this quality will be generated predominantly by humans. Decent DAM systems can at best make it easier for humans to do this.

In our experience almost all organisations trust a few key people to enter metadata and to approve new assets. While this approach has advantages (for example quality control) it does mean that these few people can spend a lot of time on these activities. I agree with Naresh that to simply ‘push this on to other people’ is unlikely to be effective. But what about making better use of the users who actually need to find the assets, for example allowing them to correct or enhance an asset’s metadata if they see fit? Obviously if they can’t find an asset in the first place then they can’t help, but plenty of ‘end-users’ have a deep understanding of an asset’s subject domain and would be keen to improve its metadata, especially if it were easy to do and they are motivated to do so (which is where gamification could play an interesting part).

Wikipedia shows how effective user-generated content can be – I would like to see these principles better applied to DAM, especially in a large organisation with hundreds of thousands of potential users. One problem is that this requires more than just software to do it – it also needs an enlightened corporation willing to trust their users!

Naresh Sarwan August 1, 2012 at 4:57 pm

I’m still not sure everyone has fully followed what I’m saying here. It’s understandable that technology based solutions are discussed as those contributing either have a software or consulting background, but I don’t think that is the core problem.

Entering cataloguing metadata for a B2B DAM where the subject matter is usually fairly dry tends to be quite a dull job. Either users won’t do it, or it becomes a ‘trainspotter’ task – which is why you end up with a lot of highly subject oriented keywords, but nothing that describes what a non-expert can see.

One of the clients I am working with right now are the container shipping division of a big logistics provider. They have a subject expert who has gone through and entered all kinds of detail about each of the vessels in the images like their tonnage, size, ports served etc. If you do a search for ‘container ship unloading’, however, nothing is found, even though they have about 2000 photos that should match! The problem isn’t the DAM system – it’s the cataloguing doesn’t describe what you can see in simple terms that anyone can understand (and might use as search keywords). This has become a big problem for their marketing people when they want general images that don’t meet a very tight range of criteria as they can’t find anything suitable without a lot of effort.

The opposite (and less common) scenario is where a team of independent picture researchers have been hired and they’ve done a great job of describing each image, but there is no technical detail about the each of the objects/people because they haven’t been briefed properly or had their work checked. So a picture of the CEO speaking at the AGM will be will be something like ‘middle aged man speaking from a podium at a conference room full of people’. That doesn’t tell anyone who it is, where the event is etc and is a problem for corporate comms managers because they can’t locate specific photos for news/press purposes.

For effective cataloguing, you need both the detail and the general concepts covered. It’s a problem that’s a lot more complicated than the traditional stock photography keywording tasks where many of the core ideas about DAM were originally formed (and that might have something to do with it).

Asking end users to contribute and improve the metadata might be effective. But, lets get real here for a minute, how often will busy staff really devote time to doing this on the kind of scale needed? Wikipedia has the benefit of people who are really into their subject, that’s far less likely for a corporate DAM. Wikipedia also has millions of users, whereas most DAM systems have a core group of people who actually submit content and enter metadata (10-20 would probably be typical). If they are keen on the subject, it tends to be these technical micro-details that I just described, which sort of helps, but probably not most users with more general needs. Lastly, some stuff on Wikipedia is not very useful at all and contains factual errors amongst other issues that actually tend to be less of a problem on DAM systems, ‘vandalism’ for example.

I think Ralph’s example of the press photo library and separating assets out into groups and applying metadata accordingly might be a potentially efficient way to handle this, especially for larger libraries where it’s just impractical to catalogue everything in satisfactory detail. But even that involves someone making critical decisions about which segment an assert goes into and the risk of something vitally important getting added to the ‘slush pile’ because the cataloguer doesn’t understand the potential value of the asset.

The real issue though is metadata education. It’s my strong belief that ‘metadata literacy’ is going to become as important as general IT literacy in forthcoming years. I think we’ll only start to see a change when end users make the causal connection between poor cataloguing and not being able to find anything in their own personal lives (as happened with IT).

I think that has to be the way forward and the job of DAM vendors should be about educating their customers rather than pretending it’s a non-issue or a problem that can be delegated to someone/something else.

Leave a Comment

Previous post:

Next post: