An Introduction To DAM Findability Techniques
This special feature has been written by DAM News contributing editor, Ralph Windsor.
Why Findability Is Critical To DAM ROI
The premise of this article is that the quality of metadata you apply to assets dramatically affects the chances of them being found and subsequently used. This has a major impact on the ROI obtainable from your chosen DAM system.
I shall outline some rudimentary methods you can apply to asset cataloguing tasks to improve the findability of your assets and use your time as efficiently as possible. The techniques are not difficult and can be accomplished by anyone with a basic level of literacy (not IT, just reading and writing). No special technology is needed and most of the recommendations should work with any DAM system. I will discuss where a more advanced range of DAM capabilities can be applied also.
The examples described all relate to images as that is the most common type of media that DAM systems are used with. The same principles can apply to other types of asset, but you need to evaluate the relevance of each to your own circumstances.
Say It As You See It – The Importance Of A Literal Description
The key to effective cataloguing is to start with a literal description of what you can see in front of you. Metadata like keywords, categories, folders etc are still important, but a clear, concise and non-contentious description of what someone can see without reference to other external characteristics or subject knowledge is the basis upon which you can add further details.
Consider the image below:
(C) Daydream 2007-2013. This image may not be reproduced without prior permission.
If you had to summarise this in one sentence, what might it be? You first attempts are likely to be along the lines of:
That isn’t bad to start with, but it won’t differentiate this from the hundreds of other pictures of bridges which we also have stored in our fictitious DAM system. We need to expand the scope of the description to provide more detail:
A bridge on a sunny day
This is better, we now have some further defining characteristics which will help narrow down the options for someone searching. Getting further into the subject, two other defining features are the age of the bridge and the location:
A bridge in London over the River Thames built in the Victorian era
We can also provide more specific location detail, the name of the structure, who designed it and some characteristics of this image such as the environmental conditions on the day it was shot:
Blackfriars road bridge over the River Thames, painted red and white. Constructed in the Victorian era (1864) and designed by Joseph Cubitt. Shot from the North Bank (London, EC4) on a sunny, spring day
This gives us a lot more detail and points to differentiate this image from another. The more descriptive terms you have, the greater the chance of the right asset being found by anyone using relevant search queries.
Too Many – Or Too Few?
You might want to add more further terms to increase the likelihood of an asset being found, but there is a point where the additional descriptive detail might be less useful. Consider this longer description:
Blackfriars Bridge over the River Thames. Designed in 1860 by Joseph Cubitt. Shot from the North Bank (London, EC4) on a sunny, spring day. Stands adjacent to Blackfriars Railway Bridge built for the London, Chatham and Dover Railway
There are several keywords here which might generate false positive results, depending on what the purpose of the asset collection is. For example, “Chatham” and “Dover” – neither of these are locations in London (or even especially close in the case of Dover). Also, Blackfriars Railway Bridge is not visible in this image. Although there is more detail which will increase the potential number of results, not all of the terms may be relevant and end users will spend more of their time sifting through these to find what they are after.
While those appear to be reasonably recommendations, you need to keep in mind the context and what users might be looking for. Using the above example, while the “Dover” and “Chatham” terms have no direct relevance to the location, if the collection subject was railway or transport oriented then they might become more legitimate. Context and relevancy are critical to getting metadata right for successful DAM implementations.
As will be discussed later, Controlled Vocabularies can be used to clarify a narrative description and also reduce their length.
Fixing Cataloguing ‘Bugs’
I frequently get asked by clients to help find out why a given DAM system’s search feature allegedly has bugs and what can be done to fix them. In many cases, this might have followed an impassioned debate between the client and their vendor where the former are convinced the software is faulty and the latter provide numerous example searches to prove otherwise. While search index technical issues certainly can and do occur, many times it transpires that it is what is being put into the index rather than what comes out which is faulty. Information scientists summarise this problem as “Garbage In – Garbage Out”.
The biggest single issue is irrelevant terms being applied that relate to the business aspects of an asset rather than the literal description. Typically, the problem will manifest itself when the end users will provide some straightforward term that strongly relates to whatever activity the organisation is usually engaged in and it does not produce as many results as they expected, or they use a different terms and all sorts of (what appears to be) unrelated records start appearing because that is what the system has been told to apply to the asset by whoever did the cataloguing.
What often happens with users who are subject experts but lack experience with image keywording, or picture research etc is they will apply lots of business or project specific descriptions to an image. What they ‘see’ is the subject matter and the back-story of why this is relevant to the organisation. The wood is missed for the trees and those who are unaware of a given project or product etc cannot decode the link between the description and the image. For a mechanised device like a DAM system, the problem is exacerbated because the software lacks the ability to fully understand the context of the request and only go on matches for a few fragmented terms.
Sometimes, organisations will realise this is an issue and some professional picture researchers will then get drafted in. They usually have the opposite problem, they can describe a scene in literal terms, but the description lacks any of the specifics to associate it with the organisation itself, like projects, product model numbers etc.
Of the two scenarios, the first is probably the most common and usually has consequences that are more complex and time consuming to resolve. Many of these problems can be prevented with some basic metadata education like an induction course before upload rights and granted plus some follow up quality control by DAM administrators. If the funds are available, it can also make sense to use a double-pass approach where an experienced asset cataloguer applies a description then that gets reviewed by a subject expert. Usually it is preferable to do it in that order as the researcher will generally find it difficult to know whether it is safe to remove a given term or re-phrase it if they have to edit the caption rather than write it from scratch.
Taking Care With DAM System Batch Metadata Modification Features
Many DAM system now contain a variety of methods to batch catalogue lots of asset simultaneously and apply metadata across groups of related assets. These can be great tools for saving time, but if care is not taken, they can end up being used as weapons of mass metadata destruction where an identical description is attached to hundreds of images that are all completely different (with the result that no one can find any of them). If you provide access to these features to unsupervised users, it’s a good idea to ensure some training is always given about when and how to use them as well as follow-up sessions to see what they have done and if they are using them appropriately.
DAM System Asset Volume Growth And Search Impact
The asset volume growth curve is different for every unique DAM system and will change depending on how large the user base is and the frequency with which new material is introduced.
In the beginning, there will usually be fewer assets to find, so many searches will end up with no results. Users may constrain their expectations of finding results (based on previous unsuccessful searches) and enter shorter search phrases to increase results as they will usually prefer anything (relevant or otherwise) to nothing.
As volumes increase, those single word queries will produce yield too many assets. Just as with an excessive number of words for each description, there is a tipping point when getting any results at all isn’t sufficient any longer and they need to be better quality. This is the point when end users will start combining terms together and expect to still see results, but less of them and more relevant to what they were looking for.
If you are responsible for your DAM system, you need to be aware of the growth in search result volumes and continuously monitor the effects. Most DAM systems now include some kind of auditing tools which can track what people search for and how many results are retrieved. The DAM system itself might not contain a built-in report, but it should allow you to export the raw data for further analysis in a spreadsheet etc. If that is not an option, some kind of scheduled manual process for manually reviewing common searches is highly recommended. You might need to do this anyway to reality-check some of the more automated reporting methods.
Search/Download Ratio Trends
A further variable often worth keeping a close eye on is to track the search/download ratios. Depending on how much usage control you apply to assets, you might also need to include in that scope the numbers of assets requested as well as directly retrieved. If the percentage of downloads measured against search results is falling, it means users are finding assets but they are becoming less useful to them.
That could imply declining metadata quality or increased demand for assets which are not on the system currently. It can also mean people are fed up of seeing the same stuff all the time. While you can automate the collection of feedback using the DAM system, from my own experience, users tend to be reluctant to put much detailed feedback into on-line facilities. You will probably need to get out and talk to people to find out what is really going on. The numbers give you a clue that there might be a problem, the human beings should tell you what it actually is.
Subdividing Asset Catalogues
There is another tactic you can borrow from professional news or sports image libraries. When they receive images from photographers, it is imperative that the images are made available to potential press clients as quickly as possible. To facilitate this, the minimum ‘need to know’ information about an image is applied – usually just the date, event/location and names of the personalities involved. Most of their clientele will be searching for the newest assets that are no more than 24 hours old (and usually far more recent than that).
Those assets that have long-term value (e.g. the best photo from a shoot or something of wider significance) are then moved to a smaller collection of selected items. They have more detailed keywords and descriptions applied, there might also be links to the larger collection if a prospective user wants to look through further examples of related assets.
In this way, those who might want to use an image for less time-critical purposes can browse and search the ‘select’ catalogue instead. This is a more efficient method for managing the cataloguing workload and ensures that only the best assets that stand a higher than average chance of being re-used later have more time spent on them.
Clearly, this approach will not work for everything nor everyone. If you have a wide body of completely unrelated one-off photos then it will not help reduce the amount of cataloguing effort required. That said, you might still find it more operationally efficient to sub-divide asset catalogues and prioritise some assets over others in terms of time spent, rather than treating them all equally.
Metadata Checklist & Classification Techniques
You will need to tailor this to your own asset collection, but you need to be asking questions like the following ones each time you catalogue:
- What is it?
- What defining characteristics exist that differentiate this scene from others?
- Where is it?
- What is it used for?
- When was it created?
- Are there any significant people associated with this?
- What additional technical or organisation-specific information such as product names, model numbers, reference identification marks etc are present?
There are other methods of metadata identification which can be used also to help uncover potentially useful metadata. One example is the content lifecycle or timeline approach. This sub-divides metadata by three classifications
- Historical Metadata
- Current Metadata
- Future Metadata
This was explained by DAM Survival Guide author, David Diamond in this CMSWire article: The Metadata Lifecycle for Digital Content. Also well worth reading is this article by Jonathan Studiman: metadata – use it or lose it.
These should be the starting points for your research into developing your own metadata cataloguing guidelines and the formulation of some basic quality control criteria to accompany them. You will need to derive your own approach to metadata discovery and it might involve a number of different methods. It is important also to be realistic about how much time cataloguing users will have to spend on each image and to avoid aiming for perfection with every asset.
What About Controlled Vocabularies?
Controlled Vocabularies (CVs) are predefined lists of keywords which are used to catalogue assets consistently and provide a more explicit and properly defined set of search terms. In more concrete terms, think of menu options rather than entering arbitrary text into a field. The method of presenting CVs is varied (and much depends on the purpose of them) but the essence is similar across all types of user interface.
CVs are the next stage in enhancing asset metadata and optimising findability. I would argue they aren’t a substitute for narrative – you need them both, but for different purposes. CVs are designed to rationalise the available search terms to make it easier for users to understand what type of subject matter a DAM system covers. They reduce the chances of getting null searches because you are choosing from a select list where there is a much higher chance that one of the terms will have been used.
A further use for CVs is to disambiguate descriptions and avoid the need for cataloguers to verbosely explain differences in the asset description (and avoid potentially skewing the results while doing so). Using the earlier bridge example, if the purpose of the collection was transport related, a candidate CV term might be the type of structure, for example, Road Bridge, Railway Bridge, Pedestrian Bridge etc. The cataloguer does not need to specify the bridge type it is in the narrative, they can just choose one of the terms. The searcher can filter the results if they are getting too many or have more specific needs. You can combine the advantages of either approach to create search facilities that are
This just scratches the surface of this wide ranging subject. The Controlled Vocabulary site David Riecks operates is the pre-eminent web resource to find out more.
Hybrid Narrative and Controlled Vocabularies
There are a few hybrid methods which I have seen in some systems and these seem to be gaining traction among DAM developers. On the cataloguing side, preset lists where the terms can be adjusted on an asset-by-asset basis are sometimes available. In this case, users will pick from an existing list and then either modify the preset slightly or enter a new term if they can’t find a suitable choice. For searching, many systems now have thesauri connected to predictive text that suggest terms to users via AJAX prompts.
As should be clear, there is no single metadata cataloguing method that will guarantee your assets always get found, however, there are a series of tactics that you can combine and adapt to optimise search quality and increase asset utilisation.
When devising metadata and findability strategies, one factor that needs to be assessed is how much time each asset will take to catalogue. There is a marginal cost effect that comes into play with asset cataloguing tasks where adding a few seconds per asset could potentially result a significant overall increase in staff time needed (with an associated cost impact also).
You need to carefully evaluate the ROI obtained from recording additional metadata details to improve findability and also test any assumptions about the actual time required to see if they are accurate. Even if you have a not-for-profit asset collection like culture, heritage or charity etc the task still needs to be approached as though it was a business exercise. What you choose to do with the productivity enhancement obtained is a separate policy decision for your organisation.
One final thought on this subject, there is an article by Andrew Manone on his Damaged Workflow blog: Working With Metadata is Like Voting where he advises readers to “Tag Early and Tag Often”. This is great advice; first and foremost you need to get on with the cataloguing task, you can go back and finesse the results later.
Readers who have found this article useful may wish to download this whitepaper: Metadata Management Strategies For Digital Asset Management.
About The Author
Ralph Windsor is Project Director at DAM consultants, Daydream and a contributing editor to DAM News.
Linked In: http://www.linkedin.com/in/daydream