Numerous DAM vendors have announced integration with various third party AI image recognition APIs recently, this was discussed (and predicted) on DAM News two years ago. The key driver for the trend is that the available products have been packaged into commodity offerings which are generally cheap and fairly easy to assimilate. There isn’t much differentiation in the current DAM market and these kind of announcements serve to illustrate that point well enough. As a software segment, DAM is something of an imagination-free zone and I have heard the description ‘dumb’ collectively applied to DAM vendors recently by various different people. They might say that, but how accurate their description is, I obviously couldn’t possibly comment.
On the AI subject, we started testing some of the products that a few of the vendors utilise and put the results on DAM News almost a year ago, specifically Google Vision and Clarifai solutions and there are others available from Amazon and Microsoft (plus a variety of others). There is another comprehensive test which includes five products by Gaurav Oberoi (whose other posts are well worth reading, especially his article on writing specs). There are some articles by Martin Wilson from Asset Bank which are good too and the topic features in the New Jersey metadata automation webinar which we reviewed late last year. It should be noted that none of these are especially technical in content. It seems to be the more hands-on commentators who are prepared to actually use the tools in question who are coming up with the more credible feedback (and who are prepared to openly share the results of their tests without fees being charged or registration etc).
Predictably, a lot of the DAM vendor marketing material about AI image recognition is overblown nonsense that heavily over-promises and is likely to under-deliver for most DAM users (with a few exceptions). Unfortunately, the copy generated by a lot of the consultants isn’t very robust as a critique either and veers between platitudes about how all this is ‘the future’ (whenever that will be) or a basic summary of the pros and cons which you can glean by yourself with the help of Google for a couple of hours. What is lacking is proper testing that has been conducted with some scientific rigour where you can see the results without having to pay for them, or register for whitepapers etc. There is also not much in the way of actionable advice about scenarios where this technology will be effective (or not) for whom and in what context etc. Further, there is little or no in-depth treatment of what impact this will have on searching and metadata. Enough people who claim some expertise in this field (including DAM consultants) will tell you that metadata is key to DAM and they are right, so some kind of risk management strategy for addressing the potential impact of these tools failing to deliver should be a big consideration before you consider using them.
I have reviewed recognition products for clients now (in addition to the two public tests described) and this is the summary of my conclusions:
- The claim that these solutions will take away all the cataloguing effort and pain for you is a fallacy: it is being over-promoted and does not bear up to scrutiny unless you substantially lower your quality expectations.
- There are certain types of images, especially those that are the kind of generic stock photography which generally produce better results and you might be able to use AI recognition for those, but still with a number of caveats.
- Some of the results are fudged by using conceptual or descriptive keywords that are of questionable value from a findability perspective (especially for corporate DAM users).
- If your media repository is focussed on a particular object or topic (for example, say your business is concerned with shoes or cars etc) then the results will also not be beneficial because the recognition algorithms will generate keywords that most human digital asset managers would know to specifically exclude. In other words, if your entire library is pictures of shoes or cars, having a system add the keyword ‘shoe’ or ‘car’ is worse than useless because it will return every single asset in search results.
- Some of the results generated by these technologies present a potentially significant brand risk so they need to be checked as part of a workflow process if there is any possibility that the keywords will be visible (note there is a potential litigation risk also, for the same reasons). That will incur a cost which increases in-proportion to the number of assets you process using automated recognition systems.
- You can use this to offer suggestions for human beings to speed up the cataloguing process and that currently offers the best trade-off for getting something from all this.
On the first and second points of the above list, the one group of DAM users where automated recognition will work better is either stock media or generalist retailers who sell lots of different types of product. The issue here is that media operations have generally invested already into cataloguing and this is business-critical for them (because if their digital assets are not found, they generate no revenue from them). If you currently get acceptable results from an offshore image keywording service that uses low-skill, cheap outsourced labour then AI image recognition offers a possible cost-saving opportunity (I note in Gaurav Oberoi’s article, he has discovered that one product reviewed looks like it is using human beings rather than algorithms and I have read this elsewhere too). I don’t find very many organisations where that is genuinely the case, however, many that attempt to utilise this approach abandon it because the results are poor or they find their assets are less generic than they first realised, so it is likely to be the same with this.
Using the automated suggestions as a cataloguing aid is a possibility, however, there is a corresponding risk that the humans behave in a manner which is less conscious than the machines and just select every suggestion without properly validating any of them so they can get the job out of the way faster. If you do employ this method, ensure any ‘select all’ options are not provided and build in a human review process (at least using a QA sample).
There is an assertion that because these systems are described as using ‘machine learning’ techniques then they will, ipso facto, improve over time. This is an anthropomorphism, i.e. assigning human characteristics to a non-human entity (a computer program, in this case). There is no proof that the machine will ‘learn’ like a real person does, only that the algorithm has some capacity to acquire data and extrapolate a range of decisions from it. The evidence so far (based on the results right now, not what is promised) is that the ‘learning’ capability is not very well developed – in fact it represents a highly reductive view of what learning actually is which temporarily glosses over some inconvenient complexities until those involved are forced to confront them (usually at a highly inopportune moment). This kind of effect seems to crop up quite a lot in computer software, it’s the ‘we didn’t think of that’ problem which anyone who has worked with software developers (or as one) will probably be very familiar with.
There is a reasonable argument that the only way these technologies will improve is if they gain access to more digital assets to allow both the existing algorithms and data to become enhanced (via both the machine and human refinement). I can accept that point, but not that clients I work with should have to subsidise it, nor act as unpaid AI research test-cases to help them develop a business model they will almost certainly profit from in the future. I believe Google, Microsoft, Amazon etc are sufficiently well-capitalised already to fund this endeavour, on their own account and they need to promise a lot less and deliver far more than they are doing right now. In an era when data is effectively a capital asset (or, perhaps a ‘digital asset’?) then being asked to hand it over in return for the right to use some flaky and not properly tested software that doesn’t live up to its promise seems like a bad deal to me.
Commercial DAM software is not a research field, it has to deliver tangible benefits from the off. DAM users are asked to spend quite substantial amounts of money on their software (even at the cheaper end of the market). If other components used in DAM solutions had the same reliability record of AI, no one would buy them (and the historical track record of our market in that department is not exactly a stellar one). The tools on offer currently are half-baked and just don’t work well enough in a production environment. I believe more people are likely to recognise this in due course and we should all demand a lot more value back for the data which these operations are being allowed to mine (for free) to optimise and fine-tune their products. The DAM vendors and consultants who are engaged in a form of channel marketing for them should require that they improve the AI tools vendors improve their results before they attempt to sell this technology to their own customers as it’s just not up to scratch at present, in my opinion.