Review of MerlinOne’s NOMAD Image Search Technology
Last week, we interviewed CEO of MerlinOne, David Tenenbaum for DAM News. David has also contributed a feature article The Future of DAM and AI where he discusses some innovations his firm has been working on, specifically with reference to a technology called NOMAD (NoMetAData)
“For the last 30 years we have had to use text terms (metadata attached to an object) to let us search for visual objects, which seems kind of ironic. What if you could search image content directly, with no metadata middleman? That would be disruptive! How could it be possible?” [Read More]
As described in David’s piece, the technology they have implemented uses a unique method which they refer to as ‘attention’. As I understand it, the textual context around where an image is placed is used to automatically derive context, in-lieu of human-entered metadata (or the keywords generated by image recognition algorithms). The text itself is not retained thereafter, it is used only once for analysis purposes.
To say there is ‘no metadata’ is something of a misnomer because there is a huge amount of textual data that is analysed and used to derive the semantic proximity of a given digital asset to the terms the searcher has entered. The ‘metadata’ is essentially a score which can be compared with the search terms to find matching images. As I will explain later in this article, this can sometimes produce some wayward results.
According to MerlinOne, this has been developed entirely in-house; it does not use any third party components to derive the results (this is in contrast to the AI image recognition components that the majority of DAM vendors currently employ). David and his colleague Peter Leabo demonstrated NOMAD searching to me last month and it did seem impressive.
Because a very large volume of data is used, the initial results appear remarkably effective and they are vastly superior to what I have observed from AI image recognition tools that use pattern matching like Google Vision, ClarifAI, Amazon Rekognition etc. We ran through a series of searches and the examples presented produced some quite credible and usable results.
With that said, during the demo I did manage to uncover one issue which can best be described as ‘cultural’ in-nature. A search for ‘Asian students’ was carried out and this returned a large number of photos of people of East-Asian descent. I pointed out to David and Peter that in the UK (and some other parts of Europe, as I understand it) ‘Asian’ tends to refer to people from South Asia such as India, Pakistan, Bangladesh etc.
This is a potential issue with using contextual text data to rank potential matches as they can become skewed by the biases of the source material used, even if a very wide sample of data is incorporated into the index. Theoretically it might be possible to use metadata to tag the sources so if you were a UK-based user, the context would be only text content that was written for a UK audience. There are many more distinctions than region, however. At this point, we get into the same issues that image recognition tool vendors have encountered where different modules are required for particular subjects.
Despite the above criticisms, nothing went wrong when I was shown the NOMAD search and it did deliver usable results to all the queries provided.
Kicking the Tyres
I asked David for access to a demo edition of their DAM with NOMAD-enabled and they provided me with an account, including the ability to upload test images.
Noting the ambiguity of the term ‘Asian’ discovered during the demo by MerlinOne which I have discussed earlier, I did a search for ‘Asian families’. This did return quite a number of images that fitted the anticipated results (certainly if I was a US-based searcher). I did notice also, however, that a lot of results were returned. The first 30 or so tended to be quite reflective of the terms supplied, however, moving down the list, things got a little less clear-cut. I had images of HRH Prince William, The Duchess of York and Prince George in one result, as well as a shot of Jade Parfitt and family. I am not sure what caused the latter to appear, however, some of the other results from the former appeared to relate to a visit the royals had made to China.
This is an issue which I gather MerlinOne are aware of and it does demonstrate that using a textual context and confidence weighting algorithm can still generate false positives as you move down the rankings. In this case, I would have welcomed the opportunity to use more conventional metadata to refine the results. When I looked at the metadata for the image of the royals, it had terms like ‘Duke’, ‘Duchess’, ‘Catherine’, ‘Prince William’, ‘family’, which were more representative of the image, if not what I had searched for.
If only the top 30-40 results are usable, it is likely that lots of the same images will get downloaded excessively, especially as users will quickly realise that the further down the results you go, the less useful they become. With a number of corporate clients I work with, lack of freshness of visual content has become a problem, in particular with any original assets they own outright themselves (i.e. non-stock images not purchased from third parties).
Where the NOMAD search is particularly ineffective is with specific requests, especially for names of people. I did a search for ‘Ralph Windsor’, which I would expect to see zero results for. Instead, I was shown photos of people like Warren Buffet, Geoffrey Howe (former UK Chancellor) and shots of various US CEOs. With the exception of the second example, this was all very flattering, but not what I would be expecting as a searcher.
Doing searches for more well-known people, for example, ‘George Bush’ generated some positive results, however, there were others that were not at all suitable, for example, a shot of former UK Prime Minister, Gordon Brown (and the photo was taken in Scotland, not even the USA). These ranked higher than some images of George Bush himself.
I mentioned a confidence threshold earlier which would be advantageous for some searches. The difficulty here is that while the effort required by the cataloguer is reduced because they theoretically do not need to enter metadata, at least some of the work now moves to the searcher instead to filter out false positives via more complex search queries. In many cases, it is preferable to see nothing than a whole bunch of irrelevant results.
In mitigation, a conventional metadata keyword search is also still available. But if you were hoping that NOMAD would free you from the task of ever needing to enter any metadata ever again, you are likely to be disappointed. NOMAD is a search strategy rather than a replacement for metadata, per sé.
MerlinOne also agreed to provide me with the ability to upload images via my test account. My expectation was that this might be more problematic since I was using my own images. I have a sample set of photos of scenes in London which I have used before to test AI tools and which (theoretically) should not be that demanding as they are quite generic tourist snaps of the kind which stock libraries tend to offer. In the past, I have been quite underwhelmed with the performance of AI image recognition components. For searches using literal terms (i.e. what an image looks like to a non-subject expert) MerlinOne’s tool was far superior than current AI alternatives implemented in DAM systems.
I uploaded an image of Blackfriars Bridge shot in 2007 which I have used in a findability tutorial I wrote a few years ago and with no metadata entered (and nothing to provide any clues in the image EXIF data). The search returned my test asset as the top result when I asked for ‘Blackfriars Bridge’. Interestingly, the others were far less useful, including shots of Westminster Bridge and various other bridges over the Thames and a number of other locations. As with the test searches I described earlier, some of later results were bizarre and not relevant. For example, I was given shots of tourists on Brooklyn Bridge and a road bridge in Long Island (with not even a river in the shot). I discussed this with MerlinOne and they explained that the results are ranked in order of confidence, so once again, this is a case of being able to apply a confidence cut-off point to limit the results. They did also mention that I was shown an 0.8 release and they do plan both cut-off threshold filtering and integration with human-entered metadata as high priority items for forthcoming updates in the very near future.
I also tried some less touristy images such as The Palestra building in Southwark and these did work quite well and yielded the expected results when I searched for them. What I did not test was something far more specific without any landmark or geographical reference points. Given the kind of content in corporate DAMs, this would need to be checked quite carefully before assuming that NOMAD is useful for this sort of material also.
As a means to provide a literal search, NOMAD does offer some compelling benefits. At its best, the results are amazingly accurate (especially towards the upper rankings). At its worst, they are essentially useless and offer the kind of material you can get from some multi-word ‘OR’ Boolean searches when using conventional human-entered metadata. The results do bear some comparison with Google Image search (which I gather may use a similar technique). The key difference is the ability to apply this to a DAM user’s own set of images which they are more likely to be able to legally use.
I discussed the MerlinOne NOMAD search with Henrik de Gyor, who is the founder of Another DAM Podcast, which many readers will be familiar with. What we concluded was that while this is quite effective with large repositories of digital assets, especially those with quite generic or non-subject specific material, it might be quite less so for a more specialised use-case. The significant issue is training and how to filter the textual context so the system can refine results based on a specific subject. The ‘Asian Students’ example given above hints at this problem and I suspect if your DAM repository contains numerous images of something like shoes, cars or historical artefacts then the results could become far less useful. Theoretically, there is potential for MerlinOne to train the algo for more specialised data sets such as the ones described. I do also acknowledge that it would not be advantageous to do this too early as it could give the impression the search facility was limited to just these subject areas. In the future, they might need to consider setting up a number of demonstration editions, some with generic material and others that are more subject-specific.
MerlinOne have indicated that the target audience for NOMAD searches is marketing users who are carrying out more generalist searches as well as those looking to sell/licence images they own the copyright to (e.g. as Royalty Free stock). I can appreciate this, although it should be noted that even within corporate marketing departments, there can be a lot of subject-specific terminology and nomenclature which might not work so well with NOMAD.
Describing the search as ‘No Metadata’ might help them with objection-handling clients who are reluctant to buy a DAM because of the effort involved in cataloguing digital assets, but this still does not replace the requirement to catalogue assets with high quality metadata. End users will require both decent metadata and a range of different search tools. Further, they will need to be educated that a ‘Google-Like’ interface is not necessarily the optimum way to locate every digital asset they might ever require. The use-case appears similar, but can be quite different since when using Google, users tend to want anything that will tell them what they need to know, whereas when using DAMs, frequently they need something highly specific which often might be just a single result.
Setting those criticisms aside, in terms of wider DAM software industry implications, this does represent some genuine innovation of the type we see very rarely these days. I am impressed that MerlinOne have developed this in-house, rather than relying on a third party component and then passing it off as innovation. In addition, using the context of an image to derive its potential value to searcher represents a definite paradigm-shift in terms of thinking about search and how it gets used in Digital Asset Management solutions. This is something that I suspect other vendors will eventually replicate, however, it will be far less trivial to accomplish than simply connecting the DAM to a component someone else has built. Those vendors whose management have less experience of DAM might find this sort of task more demanding than they may anticipate.Share this Article:
Thank you Ralph for the thorough and even handed review! We are hard at work at the two enhancements you mention: a cutoff value to remove irrelevant Nomad search results, and combining traditional metadata search results with Nomad results in a thoughtful way. We are just scratching the surface of the potential of this approach! Fun stuff!