Combining AI With Digital Asset Supply Chain Management Techniques

This article is going to be restricted to DAM News subscribers after 31st August 2020. It will be free to access, but you must have a DAM News Subscriber account to read it. Register Now.
Subscription Only

This article was written by Ralph Windsor, DAM News editor.


Numerous DAM vendors have now implemented one or more of the commodity AI image recognition tools available.  The marketing messages broadcast by most of them (‘never manually tag images again’ etc) are all equally lacking in both imagination and plausibility.  The results from most of the vendor-neutral tests I have read demonstrate that while these can demonstrate some success when generic stock photo images are used as subjects, they fail (sometimes spectacularly badly) when given something slightly more obscure, specialist or ambiguous.  Frequently these are exactly the kind of content digital assets that many users need to manage.

I contend that it is now time for DAM vendors to get practical and make more efficient use of what already exists and extract more value out of it.  This is what supply chains in other industries are designed to achieve (and digital assets ones are no different than any other, in this respect).  In this article I plan to dissect the problems faced in using AI tools and present a digital asset supply chain framework for building some superior alternatives.

The Image Recognition Paradox

The image cataloguing problem is a lot harder than most people realise (at least until they have to do it themselves).  As a result, the task often gets delegated to those least-equipped to carry it out successfully (like interns or junior employees etc).  Many DAM vendors tend to keep quiet about how much effort is going to be involved in properly cataloguing content digital assets when marketing systems to clients because this effectively de-values their contribution to the process.  If most of the ROI is generated by the users rather than the system, a reasonable question to ask is why are we spending our money on a DAM system?  This also explains why vendors are eager to promote AI tools which offer a superficially effective solution because that brings the agenda back to being a technical/product one,  rather than adoption (and hard work) by the end users themselves.

The biggest issue with most faulty metadata (however or whoever entered it) is a lack of context.  Metadata frequently is defined as ‘data about data’, but most people who have some experience of managing content digital assets acknowledge that this isn’t a very good definition.  Instead, I favour the explanation that metadata is ‘contextual data’ which helps you to understand the relevance of a given asset to whatever you might need to use it for.

AI tools lack contextual awareness and understanding, i.e. the material has no meaning for them, whereas the prospective user of a digital asset has the context of a project or requirement they are engaged in to focus on.  This is the reason why these context-unaware methods work better with generic stock-photo digital assets, because they are more universal and there is a higher likelihood that they will be generically applicable to a wider cross-section of asset users.  For example, most people in the world know what beaches, apples, men, women, shoes etc look like, but an incrementally smaller number can give you some more specialised item of detail about them, such as the location of the beach, the variety of apple, the exact age of the person or the brand of shoe.  The more specific or niche you need to get, the less reliable the automated results become.  If you add into the mix some faults in the recognition algorithms then it is easy to see why these systems fail to deliver the expected results that DAM users hope for.

The reason why AI visual recognition tools appeal to the engineering teams responsible for developing DAM systems is an obvious one (and it offers a clue to the lack of thought that has gone into their use).  They offer one key advantage: every item of metadata generated is, ipso facto, linked to the source material.  For effective automated content digital asset cataloguing, however, that is their one and only tangible benefit.  This is the paradox of automated AI cataloguing, all the data comes from the image (which theoretically improves relevance) but because it has no context, validating the suggestions and relevance-scoring them is quite hard to achieve.

What if we could add some contextual hints to help improve the results?  I believe this is where those who develop DAM solutions should be looking into next.

The Digital  Asset Supply Chain & AI Text Analysis

Now more than ever, most stakeholders understand that DAM systems are cogs inside a far bigger wheel.  Digital Asset Supply Chains represent an acknowledgement of this fact; they are the process of adding value to a collection of binary data by different processes.  That value is most commonly stored and represented by metadata (or extrinsic value).

AI text recognition and the extraction of concepts and keywords from documents (e.g. automated summaries) is far further ahead than visual recognition because the problem domain is a more restricted one (and therefore simpler to implement).  The Digital Asset Management Supply Chain combined with some AI text analysis (rather than just visual recognition) therefore, could offer some alternative methods to gain the required contextual awareness necessary to improve cataloguing metadata.  As I will discuss later, all of these methods could also be combined with AI image recognition and other machine learning methods to deliver improved automated tagging that is far in-advance of the results being demonstrated at present.

Upstream & Downstream Metadata

The Digital Asset Supply Chain encompasses not only the lifecycle of a digital asset itself once a record is created inside the DAM system, but also the period before the asset came into existence.  In more conventional supply chain management terminology, these might be referred to as ‘upstream’ and ‘downstream’.  In the context of this discussion, upstream refers to the point before the digital asset was generated,  downstream is what happened to it afterwards (i.e. onwards from when it was uploaded and given a unique identifier).

Upstream metadata needs to be linked or associated with individual digital assets because they exist before the digital asset does.  Downstream metadata is theoretically easier to link to specific assets because it is created afterwards and is, therefore, already associated, but only if there is a degree of interoperability (and traceability) between all the downstream nodes that it passes through.  The major opportunity for gaining greater contextual hints to derive metadata for digital assets is likely to be upstream, but this does not mean downstream sources should be ignored, especially as assets are re-used for different projects and other purposes.  Below I consider a mix of both as possible metadata sources.

Project Metadata (Upstream)

Many digital assets are generated because of some project or initiative like a marketing campaign, news/editorial feature, exhibition, publication, sports event etc.  Nearly all of these have a series of keywords or concepts that apply to everything that needs to be used or produced in association with them.  If the digital asset can be linked to a given project (and that fact can be flagged at the point of ingest) then all those keywords, tags or categories can be presented to the user as potential metadata.

A spin-off benefit of defining some common metadata for each of these initiatives is the ability to ensure these are brand-compliant (or use approved terminology for non-marketing scenarios).  In the same way that template systems are used to produce branded communications materials, this theory can also be applied to metadata to re-enforce messages and concepts.  If the method used to link the metadata (e.g. tags/keywords) is dynamic, these can be centrally altered and updated afterwards if changes need to be made.  As some readers may have already understood, this is less ‘artificial intelligence’ and is more basic automation (which frequently may be a preferable method to more complex alternatives).

Briefing Materials (Upstream)

Before the accession of a digital asset to a DAM system, there is usually some prior discussion and planning among stakeholders as to why it is required in the first place.  The products of this process are briefing materials, copy or even strategic objectives.  These can exist in many forms, such as: emails, audio recordings of meetings (which can be transcribed automatically), PowerPoint presentations, documents etc.  All of this can be reduced to text which can potentially be mined for possible metadata to contextualise digital assets.  Using AI text analysis tools, it is feasible (and arguably more effective) to derive credible metadata which can be further rationalised into controlled vocabularies, where appropriate.

If end-users can be given tools to tag or mark-up communications materials with codes that relate to projects or other initiatives then it is possible for a DAM solution to subsequently access the list of concepts or keywords that text analysis tools have already processed.  This activity can take place independently of the DAM, which might just be one tool of many that uses this metadata.  Some AI text analysis components claim to be able to do this without recourse to doing any kind of preliminary treatment.  As described in the last section, however, the more this problem can be simplified and the less it depends on anything ‘clever’ then the more reliable and consistent the results are likely to be.

Workflow Metadata (Downstream)

A number of DAM vendors either offer dedicated workflow or use some kind of project planning tools for either work in progress digital assets or campaigns and projects.  These usually include discussions and approvals documentation which can be correlated with other sources and also data mined as sources for AI text analysis tools.  A comparison can be made against the metadata from these sources with what is stored currently and suggestions proposed that are dynamic and based on what people say about an asset, post-ingestion (as opposed to beforehand as is usually the case).

Asset Usage Metadata (Downstream)

Where assets are restricted and users are required to apply for permission to download assets (or say what they will be used for) metadata is generated which can be used as a primitive automated feedback loop.  If there are metadata concepts and keywords consistently being used in requests to use assets, but these are not present in the asset’s metadata, this suggests an opportunity to improve the relevance.

Lightboxes/Collections (Downstream)

Most Content DAM systems include some kind of collections or ‘lightbox’ functionality where users can assemble arbitrary selections of assets.  In recent years, these have become quite a lot more sophisticated and I have seen DAM solutions where these tools form the basis of a brand guidelines (or similar features for different use-cases).  As with asset usage, where users are entering text in the notes sections of a lightbox or collections feature, this too can be analysed.  A further higher-level feature is to try to analyse what assets get collected together by users.  If a reasonable number of users are storing two assets in the same collection, but the metadata used to describe them has no or few common features and they are being found via separate searches, this suggests a potential opportunity to cross-fertilise metadata from one asset to another.

Data Mining DAM Audit Trails (Downstream)

Most fully featured enterprise DAM solutions have an audit trail which logs everything users have done on the DAM system.  This offers a source of potential data which AI tools can analyse quantitatively.  A very simplistic example is just counting every instance of a zero result search and using that to gain awareness of missing metadata.  Unlike image recognition and text analysis, most of this will probably need to be custom-developed (or at least using some of the machine learning toolkits that Cloud providers like Google, Amazon etc offer).  For that reason, it is likely to be harder for DAM vendor engineering teams, however, those that can afford the investment are likely to be at a considerable advantage over those who cannot.

Automated Suggestions vs Automated Insertion

It is important to acknowledge that however more sophisticated you try and get with contextual hints and other more refined methods using AI text analysis etc, essentially what is being played here is a confidence trick on the user where the software pretends to ‘understand’ us.  I have read AI literature where the proponents wish to claim that it is a ‘different form of thinking’ etc, but I don’t accept those arguments.  AI toolkits do not ‘think’ like I do (nor any other human being does) therefore it’s not ‘thinking’, but more like automation with a range of outcomes that are too numerous to run tests against.

From a risk management perspective, a human being should always be responsible for what metadata gets applied to digital assets.  Whoever has this role might not appreciate the burden of this responsibility, but fundamentally, computers are never at fault, the only debate is whether it is the people who used or programmed them who are responsible.  As such, AI and other automated metadata suggestions should be just that and someone needs to review them before they get applied.

Using A Secondary Metadata Corpus Generated By AI Tools

There is halfway house between automated suggestions and insertion which could offer some opportunities to reduce the burden of effort of checking the suggestions (but in a more risk-managed manner).

Assuming I have recalled it properly, this is a technique I have read Roger Howard has used with some AI image recognition components (and was described on David Riecks’ Controlled Vocabulary list group some months ago).  The AI toolkit is used to generate a list of keywords, but they exist in a separate corpus that is not initially searched.  If the user cannot find any results, they are offered the option of searching the secondary corpus to look for matches instead.  This isolates the primary index and avoids risks of ‘infection’ with faulty metadata from the AI.  The same idea can be applied to some of the more marginal techniques presented earlier in this article.

Combining AI Tools

I believe some vendors like Orange Logic and AssetBank (and there might well be others) have used multiple AI image recognition toolkits and also attempted to combined them with data their systems collect to implement a Venn-diagram style approach where they look for common terms across all of them.  This is reasonable, but the method is still too basic to offer any major productivity benefit.  An additional problem with this technique is that they can end up producing such a small range of common terms as to barely be worth using (other than for very large volumes of low quality material for which any kind of human cataloguing expense would be uneconomic).  With that said, this indicates the basis of some methods for getting more value from AI tools and it does point towards a more advanced way to utilise them.

Interoperability As A Lower Cost and Lower Risk Alternative To AI

One point that some readers might be contemplating having read the previous suggestions is that a lot of this might not even require a great deal of ‘clever’ technology (whether AI or otherwise).  A reasonable proportion of the benefits can be obtained just by being more organised and having system far better integrated than is usually the case currently in many enterprises.  This fact highlights why digital asset supply chains are perhaps the real asset that organisations need to develop and optimise.  The individual software components play their part, but it doesn’t matter how good they are as actors, if the play is rubbish, you won’t want to sit through a whole performance.

Machine Learning Specifically Designed For Digital Asset Management

The current DAM visual recognition components lack a proper user feedback loop that would facilitate a real learning capability.  Most DAM vendors have plugged their apps into them and left it at that.  They are not really using AI to any significant degree at the core of their applications (even though they might claim otherwise in their press releases and marketing).  As described when we reviewed these some of products last year, the engineering skills needed to effect this integration are fairly easy and I was able to set them up myself very quickly (and it is now many years since I would have been able to find gainful employment as a software developer).  The value of them to DAM users, therefore, is likely to be correspondingly quite low.

DAM vendors who are seeking to gain some kind of competitive advantage with their use of AI (and more specifically with solving the relevant metadata challenge) need to provide these missing contextual elements themselves by leveraging the digital asset supply chains which they service on-behalf of their clients.  It is not sufficient to just treat these tools like a black box, hope they work and blame a third party when they don’t.  The ‘machine learning’ must take place further up the stack in the DAM application layer itself.   The methods I have described are more complex than just connecting to a commodity image recognition toolkit, but they are still not especially hard to implement, especially for those who have invested into a well-designed systems architecture that supports efficient and flexible metadata ( and the workflows which accompany them).

I have spoken to some vendors recently who make a great play of their ability to leave data in-place and use federated techniques to derive catalogues of content digital assets.  If that is the case, they need to be putting that expertise to use to improve their AI implementation to make it specifically useful to Enterprise DAM, not using something more designed to tag wedding images or suggest descriptions for people’s private Flickr photo collections.  The prize is quite a considerable one since it implies getting far closer to being able to solve the issue of automated relevant metadata cataloguing than has been achieved hitherto – which is probably one of the holy grails of DAM (and a largely unspoken one for reasons discussed earlier).  I don’t believe this will ever be fully attained, however, there is an opportunity to get a lot closer to it than is the case now.

Under usual circumstances, my firm does not tend to work directly with vendors (unless we are directed to liaise with one by a client).  In this instance, however, the investment of time and effort needs to come from the sell-side of DAM first.  Further, some fairly in-depth technical and systems architecture expertise is required to get off the starting line.  As I have discussed elsewhere, this means taking some risks and having to do more than just plugging in an off-the-shelf module in order to tick the box marked ‘AI’ on RFPs etc.  If any vendors are interested in pursuing some of the ideas discussed in this piece, please feel free to get in touch with me and we will investigate the options.

Share this Article:

Leave a Reply

Your email address will not be published. Required fields are marked *