Metadata Operations Management Strategies Part 3 – Determining Quality Standards And Choosing Cataloguing Methods
In the first part of this article series, I looked at why efficient metadata operations management processes are essential. In the second I outlined some broad methods to segment asset catalogues to make it easier to adapt the cataloguing strategy in a way that is more streamlined and commensurate with the level of detail required for different segments of assets.
In this third part and for the next three follow-up articles, I will consider possible metadata cataloguing methods, both those supplied by human beings as well as automated or derived techniques. I will start with some scope definitions:
Human supplied metadata refers to cataloguing data that a real person has specifically applied to assets. This is where someone made a conscious decision to apply a given selection of metadata to one or more assets. A typical example would be users entering keywords to apply to an image or choosing a category.
Automated or derived metadata is where the cataloguing information was produced a by-product of a third party process. Although a human being might have initiated the strategy, they will not have been in control of what metadata got applied unless they decided to make manual amendments afterwards. Two examples would be the date when an asset file was uploaded or the file type of the media.
There is a degree of crossover between both and this presents both efficiency opportunities and potential risks to the metadata quality. Neither approach is satisfactory alone. Attention is often directed towards automated methods and the risks or opportunities that offers, however, human supplied metadata also introduces dangers (not least of which is consistency). To run an efficient metadata cataloguing operation, you need to carefully evaluate and select a combination of both techniques.
Deciding Metadata Quality Standards
With any operations management activity, one of the key considerations is the level of quality required. This is not always the maximum available, but should be a pre-defined target which is decided in advance.
As explained in the second article about segmenting collections, it is probable that you will need different levels of quality for each section of your collection and multiple quality control standards. These may impact on the extent to which you use one cataloguing method over and above another. The quality control methods themselves also have different approaches and I will discuss those with reference to the two main types described here.
As should be clear, there is quality/cost trade-off that is similar to other business processes and there are a multitude of refinements and alterations which you can apply that mitigate one or more of the negative consequences. I will cover some of those as part of my discussion of each cataloguing method. Although an article like this will give you an introduction, the specifics are highly dependent on the organisation in question, suppliers you use and many other diverse factors. This will give you some points to think about, but the tactics employed will need to be very case-specific.
Human Supplied Metadata
As per the definition above, human supplied metadata covers classifications and descriptions that a real person decided to apply to an asset record. Some common examples include:
- Hand cataloguing
- Using batch cataloguing
- Using asset supply chain partners
- Outsourcing to a third party
- Crowdsourcing and user feedback
I will discuss the relative merits of each, starting with hand cataloguing in this article.
Hand Cataloguing
This is the conventional way to catalogue assets and although it might be more commonly carried out to digital media in DAM systems these days, the basic principles are identical to archival and library management techniques that have been used for centuries. The biggest advantage of this approach is the potential for higher quality since each asset is considered on a case-by-case basis. The obvious disadvantage is the time required to do the work. Within this method, there are a number of factors that can impact cost and quality, including:
- Level of detail required and suitability of the metadata model
- Data entry methods available
- The people who do the cataloguing
Level of detail required and suitability of the metadata model
This means either the quantity of fields (or the volume of data inserted) as well as the selection of those fields and understand of them by the users involved in cataloguing activities. Cataloguing operations are a marginal time/cost activity, this means adding one field to the overall total that users have to complete can have measurable effects on aggregate costs when measured over tens or hundreds of thousands of assets. For that reason, each field needs to earn its keep and fully justify the extra effort involved on the part of users. Even if you make fields non-required, someone still has to think about whether it is or is not a good idea to fill it in and that adds to the time taken. Users who do lots of cataloguing may well skip over non-required fields anyway without properly considering whether or not they should be completed or not.
Some DAM solutions offer ‘Adaptive’ or ‘Class-Oriented’ metadata schema design capabilities that allow metadata models to be modified to vary the range of fields used depending on various criteria (e.g. the type of the asset). These can help refine field selection and avoid kitchen-sink cataloguing techniques where the users have to be presented with numerous irrelevant fields just in case one of them might be required (or going to the other extreme where too little detail is collected).
As discussed in the previous article, the level of detail collected might alter based on what segment of the overall catalogue a given asset is assigned to. Some longer-term assets that will remain widely and actively used for a long period might benefit from more detailed cataloguing, whereas archived assets or work in progress might only require the addition of perfunctory detail (with the rest being entirely automated or derived).
Data entry methods available
The data entry method used for cataloguing is another vexed subject that generates equal amounts of irritation from novice and experienced users alike, but for different reasons.
When most users first start to use a DAM for cataloguing, they prefer ‘wizard’ type interfaces that explain the stages and guide them through the process. After they understand the basics, these more verbose interfaces can become more of a hindrance than a help.
Experienced users who have a lot of hand-cataloguing to do will typically prefer a grid or spreadsheet style layout where they can see and modify multiple records simultaneously. There is some cross-over with the discussion of batch tools I will discuss later, but even if they are entering quite small numbers of records, those who are fully familiar with the cataloguing process will want to be able to do it quickly via direct modification of individual records. To run efficient metadata cataloguing operations, the cataloguing interface requirements of both types of users need to be accommodated. In some situations, they might also necessitate various points in between and options to flip between basic and advanced cataloguing interfaces can be advantageous also.
The people who do the cataloguing
Last year, I wrote a feature article for DAM News on asset findability and some basic principles were described for those new to this subject. One of the issues mentioned in that piece was understanding the distinction between a literal description and technical or subject-specific metadata. Where cataloguing of business-oriented DAM repositories gets delegated to individual staff, they will tend to use terminology which will be obvious to their colleagues in the business but hold little meaning when considered in isolation. Keyword searches in DAM systems (which is still the most popular method to find assets) enforce that distinction and divorce the terminology used from its original meaning. When I have analysed DAM systems where the client mentions search/findability issues, this issue nearly always comes up. To properly catalogue assets so that searches are as relevant as they need to be, you require both business specific terms (e.g. product names, projects or industry jargon) as well as a description of an asset in simple terms that anyone (even with no prior knowledge of the organisation) might associate with the asset.
An effective way to address this issue is to use a two-pass method for cataloguing where first a skilled picture researcher (or someone with general experience of cataloguing visual assets) analyses an item of media from the perspective of general usage, in other words, what someone who had no prior knowledge might expect to see when entering a selection of keywords (see the ‘bridge’ example in the article referred to).
The second pass uses a subject expert who reviews each asset and decides what additional technical or subject-oriented detail needs to be added so that searchers who use those specialist keywords will find suitable material via that method. Most end users initiating searches will probably employ both general and specific search terms, usually something like [adjective] [noun] (e.g. “red K6 phone box”) so this can be effective way to improve quality.
As should be obvious, the biggest problem with this method is that it costs a lot more since both in-house staff and freelance cataloguers are needed. The latter might still need to be trained for highly specialised subjects so they appreciate the context of the metadata and any impact it might have on the general terms they use. As discussed, however, if the catalogues are segmented, this more expensive (but higher quality) approach can be reserved for a select collection of assets that warrant it.
Conclusion
In the next article in this series, I will go through two other human-based methods of cataloguing assets:
- Using batch techniques
- Using asset supply chain partners