Artificial Intelligence & Metadata Cataloguing: Advice For Digital Asset Managers
This feature article has been written by DAM News Editor, Ralph Windsor.
Artificial Intelligence (AI) and the possibility that some sophisticated software will magically transform all that dry asset cataloguing work into a task that is no more of a chore than loading your washing machine is a subject that seems to never go away in Digital Asset Management. It is easy to see why: if there is a point when it is possible to see new or prospective DAM users become visibly crest-fallen it is the realisation that the software only helps make the job of managing your assets easier, it won’t do it for you.
Over the last few years, I have had the opportunity to look at a number of technologies which have attempted to address this problem through AI and related techniques (and we have covered a few examples on DAM News in the past). This article is a summation of conclusions I have reached as well as some advice.
Metadata And The Artificial Intelligence Sceptic
My own background is from the software side of DAM rather than having been an archivist or librarian, so, in theory, I should be positively disposed towards these innovations. Reviewing AI cataloguing technologies, however, has become a bit like investigating claims of the paranormal: there might just about be something in it, but there is nowhere near enough proof to achieve scientific legitimacy. Some battle-hardened DAM users might be inclined to comment that finding digital assets with certain systems may actually be more difficult than making contact with deceased pets or relatives etc, but I digress. I have become fairly sceptical about the effectiveness of AI methods having seen so few deliver on their promise (in regard to digital asset cataloguing) however, it is essential to keep an open mind and there is undoubtedly interest and therefore a commercial opportunity for someone which might generate sufficient momentum to incrementally enhance the results. If these technologies are combined together, that might offer some greater potential. Like many other issues in DAM, there are no silver bullets and most of the examples described depend on you having established some best-practice processes for manual cataloguing as the foundation (more about that topic later on).
Types of Automated and Intelligent Metadata Cataloguing
The range of these tools appears to break down into a number of categories, this is a list of the broad types:
- Batch comparison using contextual hints
- Pattern recognition
- Machine learning
There is a fair amount of crossover between the above and many utilise hybrid techniques. I will describe them below.
Batch comparison using contextual hints
This is where a system will use some context-dependent information about an asset that the user may not necessarily have had to provide and then they will compare that against the existing repository of assets to try and infer potential tags.
It is important to note that these systems do not generate narrative text, rather they used predefined or controlled lists of values and try to choose what appears to be the most appropriate based on some mechanically applied selection rules. Usually there are thousands of possibilities, so a scoring algorithm will get applied to rank them. This is all fairly straightforward database engineering stuff and you can see it applied to on-line shopping websites where they present some alternatives based on the behaviour of other users. The options I tend to get offered when I use these sites are a bit hit and miss and are often more of a distraction than a help (with some exceptions). While I can ignore unsuitable suggestion on a shopping website, they could become more of an issue on a DAM system where less diligent users accept the defaults without bothering to verify their suitability.
This technique depends on having a decent-sized repository of assets already (or at the very least some source data structured in a format that can be used). In addition, users need to be already finding suitable material and ideally cataloguing it appropriately also. This kind of feature rewards the metadata-virtuous who have already been following best practices and policing the accuracy and validity of their metadata, but it makes things markedly worse for those DAM systems where the cataloguing has become a bit of a free for all and gone unchecked for many years. This is an issue which seems to keep coming up with any evaluation of these automated methods.
Pattern recognition technologies for purposes of cataloguing seem to fall into a couple of distinct classes: domain-specific and generalist methods, the latter are often sold as AI technologies, but the theories behind them are fairly simplistic (although they might be far from straightforward for a software developer to implement).
Domain-specific pattern matching tends to be far more effective because the developers can specialise on a particular area and become far more nuanced and sophisticated at optimising the results. Some examples include facial recognition, optical character recognition and speech recognition (although the latter is less reliable than the other two). The key challenges with using these in DAM implementations are mostly about integration, but also when to apply them. For example, facial recognition can be used to try to automatically generate tags by cross-referencing with a database of people (e.g. an HR system) but usually the DAM needs to be told when material is provided that a given asset is of a type that the pattern matching technology might be able to assist with. It is possible to just have it run over everything uploaded anyway, but sometimes this can produce some unexpected results where tags will be suggested that are not suitable. As you can see, there is still some work required: either to give the system adequate hints to tell it when to attempt recognition or to correct invalid assertions it has made automatically by itself.
There are some generalist pattern recognition tools which claim to be able to identify given objects of a non-specific nature (usually they are for static media like images only). I have not seen decent results from those examples of these products that I have reviewed. Most of the vendors of them rationalise this by telling prospective purchasers that the systems need to be trained. How do they do this? You guessed it, either by having an existing repository of material that has already been properly catalogued, or someone going through the time-consuming process of training the system manually. This can generate a kind of chicken and egg conundrum where the machine cataloguing will not work effectively because there is no human generated source data and the resulting metadata is poor because the users thought the system would do it all for them.
This is closer to being true artificial intelligence, an example is the research carried out by Google where they used vector representations combined with natural language processing to derive image captions. With metadata cataloguing, the advice offered usually is to favour controlled vocabularies over narrative, however, there are limits to what you can do with that and for optimal findability, you usually need to combine both techniques as they each offer strengths and weaknesses that compliment each other.
I have not seen the Google system (other than the PR examples on their blog) however, I have seen other systems which try to achieve the same goal of unassisted automated caption generation. Usually, when the tools are presented in the same manner as a guided software demonstration with the operator offering images of their own selection, the results appear highly credible and effective. Where the quality declines more rapidly is when some random images are presented that have not been used before, especially when these are either non-representational or they all appear similar.
It is arguably expecting too much to have an AI system come up with anything more than generic captions, but it does indicate that these technologies might be good for asset libraries with conventional consumer-oriented subject matter and harder to apply for more specialist material. One potential political minefield as I can foresee with all these systems is that they depend on having access to many assets to enhance their accuracy. Presumably, these need to be combined into a central corpus to allow each subsequent recognition transaction to access data already collected and generate compound accuracy ROI as a result. While that might not pose a problem for public image repositories like Flickr or Picassa etc, for corporate DAM solutions, some of which might contain possibly sensitive assets, this raises a number of data security issues. Whether those who operate these tools will allow users to not contribute cataloguing data back and what costs there might be if they do are all issues I envisage to start being discussed more widely in the future than they are now.
Tips For Reviewing AI Cataloguing Technologies
I suspect most digital asset managers who have encountered these technologies may well have begun to devise strategies for assessing them already, but for anyone who has not or is looking for some further guidance, the following would be my recommendations:
- It is reasonable to allow the sales representative to demonstrate their system first using the optimal conditions, but you need to follow that up by supplying your own set of test images. Ideally, these should represent a broad and representative sample and with a mix of people, products, scenery, buildings etc. The vendor should not be permitted to take these from you in advance.
- Some artificial intelligence and automated recognition vendors will tell you that their system must be trained first. That is fair, but you need to see the training process, with your samples of images and they should be able to do this live with you present. The level of training required needs to be measured against how much time is saved by just doing it with human beings.
- What are the integration options for getting the generated metadata out of their solution and into your DAM? Most DAM systems will assign asset records an automatically generated ID (usually from a database) how do these link up with what the cataloguing automation tools? Compared to the core problem these systems claim to solve, this should be a relatively easy systems integration task, but because of that, it can sometimes get forgotten or the complexity of the issues underestimated. One other area to consider is integration with taxonomies and existing controlled vocabularies if the system is using batch comparison methods.
- Assuming that the recognition can be carried out in real-time, what is the throughput? How long will it take from a product being given an image for it to deliver cataloguing data back to the DAM? Usually processing is quick and a matter of seconds (or less) but this cannot be assumed, especially if the system uses some highly complex methods that are computationally expensive.
- Having checked all these point, a larger test is advisable (e.g. 300+ images). The results should be passed to end users, preferably with some conventional cataloguing also. A blind test where asset users get asked to decide whether they think a machine or a human being has catalogued an asset is worth doing, as well as some kind of basic quality scoring applied to all. Those will provide clues about the overall effectiveness and also point to watch out for.
The above list is far from exhaustive, but this should give you a good idea about whether the quality of the results will meet the minimum criteria necessary to be useful (see the next section, also) and the more nuts and bolts problems of how to get the data flowing reliably from and to wherever it needs to go.
Practical Application Of Automated Metadata Cataloguing
The results when using many of these technologies are interesting and occasionally even surprisingly good, but usually without sufficient consistency to justify taking a risk on them and becoming the unpaid research guinea pigs for the automation vendors (as much as they will probably promise that you won’t be). With that said, some are ready for use right now, for example, Adobe have a batch comparison facility called SmartPic which we discussed on DAM News a few weeks ago and there are a good number of DAM vendors who integrate third party facial recognition and speech recognition. The dynamics of the digital asset supply chains in most organisations of any size are such that it is unlikely in the future to be able to rely solely on manual cataloguing carried out by metadata experts, especially for more mundane assets that need to be catalogued, but not at the highest possible quality.
To get the most out of the various possibilities offered, managers need to devise strategies that intersect with each other then keep testing and reviewing them. This implies a number of requirements:
- You need well-defined, easy to understand and widely observed cataloguing processes that users can be trained to carry out. Quality control checks must be in place to ensure that users really are following the processes. Before any kind of automation tool can be used, that must be in place, it is the foundation on which you will build more leveraged methods.
- You must have a detailed overview and full understanding of your digital asset supply chain already. If you decide to use any of these technologies, what will be the impact on each point of your supply chain and how will that spread downstream? An implementation plan should identify key risks and plans to mitigate them. Automation is a two-way street: it is possible to exponentially increase the level of damage caused to the quality of you metadata as an unwanted by-product of trying to enhance productivity if you do not manage the risks accordingly.
- Where possible, your asset catalogues should be segmented according to a minimum acceptable level of quality: it will almost certainly be impractical to make everything ‘very high quality’. If you have a very large repository of assets that need to be made available quickly, it might be appropriate to make greater use of the AI and automated methods. For higher quality materials, these should still be catalogued by those with more skill and understanding of the subject matter.
- As described earlier, these technologies have differing strengths and weaknesses, for that reason, you may need to combine several of them together and you need to be able to integrate and orchestrate each of them.
- One point not widely appreciated by many is that if your taxonomy is well designed (and the workflows are fully integrated with it) you can create many of the same benefits of AI and automation with little noticeable effort required by the users. As should be clear to: your taxonomy and metadata model is integral to successful DAM and is like its DNA. as such, it is another core element that has to be in place well before any automation can be considered.
The potential of AI is undeniably significant, the problem is an underestimation of how difficult it will be as well as the over-exuberance and hubris of those responsible for their implementation. For that reason, the concepts will likely acquire a momentum of their own and there will be staging points where some technologies start to become more useful, but only for certain, well-defined tasks.
The examples earlier in relation to facial recognition and OCR are a case in point; if the scope of the problem can be kept tight and well defined, the results are likely to be better (this is a consistent pattern with technology in general and software in particular). This suggests that Digital Asset Managers might need to utilise a multiplicity of different tools, each capable of solving one piece of the metadata puzzle and then integrating each of them as well as regularly replacing different elements with a superior alternatives as they become available.
In all likelihood those selling these systems will make ever more overblown promises and try to make users forget they are involved in a process which has its roots in industrial practices which have already been a theme of modern civilisations for hundreds of years. For those who can set aside the science fiction and envisage a clear idea of what they want to achieve, however, there might be some potential benefits that at least make it worth keeping a close eye on and carefully considering any low risk opportunities they might offer.