The Perils And Politics Of Automated Metadata Generation
An article which has been bought to my attention by a few different people recently is Image Content Recognition: A Stillbirth? by Paul Melcher, writing on kaptur.co. The item discusses some controversy surrounding Flickr’s recent introduction of their auto-tagging feature (which they rolled out without asking permission and then had to withdraw). In common with a lot of these automated methods, they have stuck to the safer approach of selecting keywords which their algorithm deems relevant (rather than the higher risk and less reliable caption generation approach). Even so, they have contrived to upset a vocal section of their users which has further highlighted the shortcomings in the methods employed:
“Flickr auto tagging tool comes from both a very selfish and unselfish motivation. It wants to help users who have little time/desire to tag their images to be on the same level as those who spend countless hours adding keywords. It also wants to make the 11 billion images it stores to be fully indexed and searchable in preparation for its announced licensing tool. However, what it didn’t expect is that many users do not want help. In fact, they feel it is disrespectful of their undeclared right to control their tags. To add insult to injury, the auto tagging feature is carelessly inaccurate, or, at least, seems to be. One of the biggest misconception about auto tagging today is that it is capable to perfectly recognize content 100% of the time. It is not. Nor are human beings by the way. And while accurate at 90 to 95% , those 5% mistakes ( or false positives) are the only result people notice. The impression is, even if the system makes 1 mistake for every 1,000 images, that it just doesn’t work, because people notice errors and not successes. As of today, Flickr is considering making the feature as an opt-in option only.” [Read More]
I am not sure this is an ‘unselfish’ motivation. Flickr are owned by Yahoo who have stock holders to answer to, so I doubt anything that incurs a significant expense gets done for unless it either generates profit or enhances the value of their various subsidiary businesses. Even if the personnel involved do not like the idea, they have to deliver returns or they get fired (and replaced by others who are prepared to take a more expedient approach).
One of the characteristics of data (which, by proxy, includes metadata) is that unlike a conventional commodity, the marginal utility increases the more of it you have. Technology conglomerates like Yahoo, Google, Amazon, Facebook etc are all well aware of this and the basis of their business models is now almost entirely about data acquisition in order to generate competitive advantage (i.e. buy data cheap, compound the value and sell it).
Hitherto, if you were responsible for making strategic decisions at one aforementioned enterprises, the focus would be on leveraging the time and effort of your users by providing some free application like a search engine, discussion board, photo album, email client or subsidised bookshop etc. The cost to implement and support these data collection facilities is significantly less than the value of what they generate, but the market is a competitive one where only the data-wealthy can produce the required returns to retain their pre-eminent position. The next potential source of leverage is artificial intelligence, taking the human effort and recycle that in an automated and very large scale manner. This is the motivation behind Flickr offering to auto-tag images.
According to the article, Flickr and others who are trying similar tactics have run into the same kind of problems where the accuracy of the results has been called into question. I note Paul’s observation that human beings are not accurate either. He is, of course, correct about that and my expectation is that this factor is contributory to the inaccuracies (plus the software having flaws, which is virtually inevitable all the time human beings are responsible for writing it). This is one of the misunderstood issues of using automated methods for generating metadata – not only do you get garbage in/garbage out, with automated methods, it becomes more like some noxious polluting gas that rapidly spreads into all kinds of unexpected and unwanted places. It isn’t always easy to predict how this will manifest itself, for example, Paul mentions that the automated metadata can be used for searching purposes but hidden, or ‘ghost keywording’ as he refers to it. The danger here is that the faulty tags still produces rogue results that either have no relationship with the associated asset, or worse, have one which is undesirable. This is a problem already with metadata that was created unintentionally or as a by-product of the asset origination process. For example, while embedded metadata is without doubt essential for DAM, it can’t just be ingested without some verification process being in place. One perennial issue that always seems to crop up is where an originator (e.g. designer or author) uses an existing document as a source or template for another and forgets to change the properties of it so residual metadata (e.g. the name of a competitor or a different product) generates false positives in searches. For DAM to be effective, you need complete visibility of all the metadata held with an asset record, even if you can’t necessarily modify all of it.
There is undoubtedly interest in the area of automated metadata generation but I cannot see it being a problem which can be regarded as ‘solved’ for some time to come, mainly because the methods used by those responsible for implementing it are still fundamentally industrial in nature. They all depend on brute-force and scaling up capacity, but the decision making at each stage is too simplistic to be effective at detecting nuances and variations which have a dramatic impact on the meaning and context of descriptive metadata. The model used by many responsible for implementing metadata automation seems to be one that 19th century factory owners would have been familiar with, in fact the same can be said of crowdsourcing where the basic idea is to use virtual methods to ship in cheaper labour and take a calculated risk that the quality of the result will be within the thresholds of what is deemed acceptable. The difference relates to the subjectivity of the exercise and over-rationalisation of the problem domain by those trying to implement solutions for it.
Using nine year-olds to clean lint from looms in the 1860s was a task that quickly became quite straightforward to mechanise because it followed some predictable patterns which engineers worked out they could successfully model. Although tagging and keywording assets like photos can be repetitive across batches where there are similarities and does involved a process which might look superficially the same, the thought process to carry out the work requires the cataloguer to treat each asset as unique, indeed, many of the biggest issues with human catalogued assets results from excessive automation where someone will batch catalogue a large series of assets with the same metadata, even though they depict scenes that are not related to each other. This is the essence of the problem, software architects and engineers tend to see data and assets as commodities, but the users of them do not, they need a specific asset for which there might potentially not be an alternative (where even a close match is unsatisfactory). To achieve that, they need someone (or something) else to have made the same critical distinctions on their behalf beforehand, otherwise either the results are wrong or they end up sifting through large volumes of material and losing any advantage the technology promised to deliver.
As was discussed in the workflow article a few weeks ago, for complex, decision-oriented tasks where those responsible have to make continuous adjustments to their thought processes to maintain an acceptable level of quality, it is more effective for IT systems to behave more like an assistant or administrative aid so that focus can be applied to the more complex considerations which the technology is likely to be unsuitable for.
In the case of the Flickr example discussed, I think they have missed an opportunity to both provide users with tools to assist cataloguing and generate them a data asset which is more valuable. The alternative approach would have been to present the automated keywords as suggestions, which users can then select, with options to use them all if relevant (including an option to just use the automated suggestions for anyone who could live with the resulting reduction in quality). This would provide them with quality-oriented data about each keyword selection that had traceability back to the algorithm which proposed it (and therefore an opportunity to enhance the results). Going further up the sophistication curve would be to allow users the ability to refine the metadata model and range of assets used as source material. For example, if you knew you were going to catalogue a hundred pictures of the same bridge, a subset of existing assets could be given a higher weighting as source materials for automated suggestions, increasing the likelihood of them being relevant. I will acknowledge that these kind of facilities are harder to code, test and maintain, but they would have avoided the PR disaster that appears to have ensued and also significantly increased the quality and value of the data that Flickr could have collected.
While I can appreciate the ambition of many of those who are pursuing artificial intelligence oriented solutions to complex problems like asset metadata cataloguing, I think they collectively (even if not always individually) exhibit a level of hubris or arrogance that stems from an inadequate understanding of the nature of the problem. It is my opinion that they need to learn to walk before they try to run or they can expect to fall over on quite a few more occasions. Although as described in the article discussed there is a lot of venture capital being thrown at this problem, their funds are not unconstrained and if the results continue to promise but not deliver (especially when it comes to real-world use outside the developer’s lab) then I can see them moving on to other more fruitful opportunities.
Share this Article:
Count me among the many people who are still sad about Yahoo owning Flickr. I was one of the original paid subscribers, and I still feel the loss. Not just a loss of accessibility, but a loss in participating in a forum that had AMAZING leadership in the digital image distribution community.