In the preceding article in this series (part 3) I reviewed human cataloguing methods, especially where each asset is entered manually one at a time. In this piece, I will consider batch cataloguing and using metadata from supply chain partners.
I include batch cataloguing as a non-automated method because to use it someone has to make a conscious decision and usually there is at least some level of effort involved to apply the strategy. This is in contrast to fully automated or derived metadata that may get created whether DAM users want it to or not.
Any Digital Asset Management strategy where the cataloguing will get carried out by humans must consider batch techniques and make use of them where there is a proven benefit and the risks can be controlled. After searching, batch modifications to asset metadata are probably the next most valuable benefit of DAM solutions from a productivity perspective. In the majority of cases, batch updates will be initiated using features built into a DAM solution. Some examples include:
- Using metadata entry templates
- Using indexed metadata where appropriate
- Mass find/replace/append operations
- Batch importing from external sources
- Combination tactics
I will discuss each of these below.
Metadata entry templates
Many systems allows predefined or default values to be set up. These can sometimes use an existing asset or are more generic. The benefit is avoiding re-entering the same metadata again for numerous cataloguing tasks. The risk is that the defaults will not be changed and lots of irrelevant metadata included as a result. Ideally, DAM solutions should flag those assets catalogued with templates so quality control effort can be directed towards randomly checking them.
This is not really a batch cataloguing technique, but using this method makes it far easier to make a single change to one value and have this applied as though a batch change were being made.
Indexed metadata means using numerical references rather than free text. Usually this is represented as controlled selections of options like drop-down menus, radio buttons, checkboxes etc. Indexed metadata is usually quicker for users to enter as they only have to select an item not type it all in. This is the information science basis for controlled vocabularies (and why they offer numerous benefits for improving asset metadata quality and consistency).
Where possible, narrative free text metadata should be used sparingly and controlled input fields favoured instead (although that is obviously not always feasible). There are cataloguing quality risks with indexed methods which you need to be aware of, one of which is that no suitable options are available and users either choose something generic (e.g. ‘Not Applicable’) or an unsuitable alternative. These can be mitigated by providing ‘other’ choices (and associated free text input) but the most reliable technique is to constantly check the real reasons behind the selections being made and make refinements accordingly. I wrote a guest article for the Picturepark blog earlier this year which might offer some guidance on this topic, see Structured Metadata Analysis Techniques for more.
Mass find/replace/append operations
These are usually used post-cataloguing to correct existing issues but depending on how your DAM of choice works, they can be used pre-cataloguing to set up some defaults without establishing templates (or as an alternative if that feature is not available). A typical use is to correct spelling mistakes, but batch append operations may get applied if metadata models are changed after many assets have already been catalogued.
Batch importing from external sources
This is where metadata generated outside the DAM solution gets loaded and associated with a lot of asset records. Sometimes this can be done automatically from external sources, like product data, or it might be legacy metadata stored in old systems, these topics will be covered in the follow-up article on automated cataloguing.
A further possibility is where an external tool is used to enter metadata. Despite advances in DAM systems, many heavy DAM users eschew their cataloguing interfaces to use a more free-form tool like a spreadsheet. Particularly with web-based systems, highly responsive interfaces still lag behind the needs of heavy users and familiarity with spreadsheet interfaces as well as the range of options to prepare metadata still make them preferable options for many intensive DAM users. As well as batch cataloguing templates, batch importing equivalents are also useful so an example file can be provided for someone to complete and then load back into the DAM system in the knowledge that it is should be compatible.
In addition to spreadsheets, many DAM users make use of photo cataloguing tools like Lightroom. These use embedded metadata rather than externally held data. Nearly all DAM systems worthy of the name will now support at least the reading of common formats like IPTC, XMP etc. When helping clients to devise metadata cataloguing processes, I usually find that at least one person will be provided with a tool like this and they will often be some of the key users with responsibility for asset ingestion.
The risk/rewards trade-off with batch importing usually relates to technical issues with the import process. The risk is that some glitch either with the data provided or unexpected problems importing it will either cause the import to fail or produce undesirable results. The benefit is a dramatic reduction in the time required to catalogue a large quantity of assets.
I cannot recall any DAM solution where I have been involved from an early stage where batch importing has not been used for either migration, ingestion or both. For many repositories with substantial asset volume growth, batch imports are an essential tool to help stay on top of metadata cataloguing tasks.
If you have a large volume of assets that need metadata to be applied, a refinement on the catalogue segmentation technique described in part 1 is to use several different batch metadata techniques and combine them together. As discussed in the last article, every field that is included in an asset’s metadata schema represents time (and therefore money) for someone to complete. Prior to undertaking hand cataloguing of a large volume of assets, someone should always what potential options are available to batch catalogue assets, at least as a starting point and whether or not these would impact quality to an unacceptable degree.
Providing the vendor of your DAM solution will not charge you professional services fees for soliciting their advice, it is almost always worth asking them what the best way to fulfil some batch processing requirement is, especially if a route to achieve what you want is not clear to you initially as they should know their product better than anyone else.
As well as using built-in tools, some direct modification of the database used to hold asset metadata might also be a quicker method. One characteristic of DAM solutions not fully appreciated by many users is that the interface, even if manually operated, is essentially a tool for making aggregated alterations to an underlying database. Although DAM systems get referred to as ‘databases’, the majority of the effort involved in developing them is in creating facilities that enable control over the data they hold. Depending on what you want to do and the design of the software, using the system itself to carry out a given cataloguing task can be not as efficient as a more direct approach. One caveat with that suggestion is that the direct modification should nearly always be carried out by the vendor themselves since you will almost certainly invalidate any warranty or support agreement if you make those changes without either them carrying it out or at least their approval. For SaaS solutions with databases that are shared across many customers, this is also considerably more difficult since the vendor has to factor in the potential impact on all their users. While these type of direct interventions are still feasible with the majority of solutions, newer generations of DAM solutions are starting to use ‘NoSQL’ databases designed for very large repositories which operate in a different manner (as well as far more complex schemas to support a wider range of needs). As such, direct alteration of data inside DAM systems is likely to be come increasingly difficult in the future, but it usually is still an option to at least consider for some obscure batch updates.
The point from this section is to focus on precisely what you want to do first and foremost and then invest some time into identifying all the various potential ways you can achieve that objective using batch processes, even if might involve some more esoteric methods to achieve it that appear closed off or unsupported initially. Any candidate technique needs to be assessed from the perspective of the cost saved, the risk introduced and the impact on quality that may result.
Using asset supply chain partners
A further option for optimising cataloguing efficiency is to ask external digital asset suppliers to contribute metadata. This means that a photographer or video producer etc will include descriptive metadata and this can be used as the basis for cataloguing. Usually, embedded metadata techniques will be employed, but sometimes the metadata can be supplied externally in a separate file.
On some DAM implementations I have seen, the entire metadata cataloguing task is delegated to those who originated the asset file, however, these are usually employees of the organisation (or are closely affiliated with them). A typical example would be images or videos captured in the field. More commonly, however, only basic information like the caption, keywords and copyright data will be included.
The key risk with relying on third parties to provide useful metadata is whether the quality is of a sufficient standard. While it is possible to mandate that suppliers provide metadata that adheres to quality guidelines, if the requirements are onerous, they may request additional fees for the effort involved (or just ignore them). Usually, a photographer or video production firm is hired because of their ability to generate satisfactory examples of the media they work with. If they fail to meet metadata cataloguing requirements, other factors (e.g. the quality of the work or price) might need to supersede those concerns. Ability to catalogue effectively will always have to take a second place to their primary function.
One further issue with using metadata from external suppliers is copyright and attribution. For most corporate assignments these days, the commissioning organisation will acquire copyright. Most professional photographers will automatically enter their own copyright as a default setting into their equipment when cataloguing. Unless someone deliberately checks for that, there is a higher risk that digital assets will have incorrect copyright information (i.e. they will be distributed to users with references that say the originator still owns it). The opposite problem where the IP owner has their ownership removed on insertion into the DAM (e.g if the system used overwrites it) can have worse consequences that may result in litigation.
Unless you have a highly specialist DAM that is highly oriented towards external supply (for example, sports or news/press media libraries) metadata provided by asset supply chain partners should be treated with care and never used without someone having at least reviewed it first. It is possible that along with some automated or batch techniques, the raw data can be utilised, but more likely it will be not much more than reference or source material to provide inspiration to internal personnel carrying out cataloguing. With all that being said, the circumstances of your media library operations might make this technique more useful and as noted in the second article, depending on the required quality standards, it might be a suitable option for some categories of assets if not all of them.
In the next article in this series, I will move on to the final two options under consideration for human-based cataloguing methods: Outsourcing and Crowdsourcing
- Metadata Automation Webinar Recording From New Jersey DAM Meetup Group
- Can Enterprise Taxonomy Management Survive Analyst Reticence - And Does Anyone Else Care Anyway?
- The Role Of Taxonomy Governance In DAM Interoperability Initiatives
- Google's Visual Case Study Of The Perils And Politics Of Automated Metadata
- The Perils And Politics Of Automated Metadata Generation