Improving DAM Interoperability In 2017

This feature article was contributed by Tim Strehle of Digital Collections with input by Margaret Warren of ImageSnippets and is part of our Improving DAM In 2017 series.

 

DAM products have improved a lot in the last few years: They are now cloud-enabled and more user-friendly, they support video and let us share content better than ever.

Integrating the DAM with other systems has also gotten easier because most systems have APIs by now. But point-to-point integrations are still the rule, where each integration between any two systems requires a software developer to write code tailored to them.

Let’s look at the importance of interoperability for Digital Asset Management, what interoperability means in practice, and how we can improve it.

DAM Systems As Content Hubs

Business processes involving creative content and other digital assets are crossing IT system boundaries all the time. To help the organization derive value from its digital assets, the DAM needs to enable automated, effortless data flows between systems.

Last year, Theresa Regli predicted that 2016 would “see the term ‘content hub’ emerging”. She was right: Many DAM vendors are now using it to market their products. (Examples: ADAM, Stylelabs, Widen, WoodWing) It’s a term that makes sense: DAM systems have evolved from almost dead end “image archives” into central services which gather digital assets from many sources, make them findable, and then route the assets to other people through various connected systems. By definition, interoperability is a content hub’s most important feature.

Jeff Lawrence wrote about a conversation with Bynder CEO Chris Hall: “My understanding […] is that traditional DAM was an archive, but today DAM is an important component of a larger ecosystem of connected tools that must have the ability to work together. Without an integrated solution, a siloed DAM is essentially a lost opportunity for the business.”

Interoperability Is A Two-Way Street

DAM interoperability is a two-way street: In addition to DAM specific data being routed to other systems, or between DAM systems, “foreign” data – product data, customer data, Web analytics data – is also commonly held within the DAM to provide context for its assets. (This is one of the points discussed in Ralph Windsor’s DAM And The Politics Of Metadata Integration.) We need to consider this in our discussions.

Web-Connected By Default

Another quick detour before we dive into the specifics of DAM interoperability:

Our understanding of information system integration has evolved in the last decade. The idea of the Web as a globally interconnected space has become true: Employees and collaborators work from anywhere, and software is moving into the cloud. An increasing number of digital assets originate online – on mobile, connected devices – and are managed exclusively for the purpose of sharing them online. Intranet-only, on-premises setups are becoming the exception, while the norm is a Web-connected DAM system that has the full potential to connect digital assets with any number of people and systems.

This has implications for the technology our systems are using to communicate: Copying files between network shares doesn’t work on the Web, and FTP is not a good choice either.

Use Cases And Operations

Now let’s look at some real-life DAM interoperability use cases.

The most common ones involve content stored in the DAM system being handed over to a publishing system: to a Web CMS or social media sites (Facebook, Twitter, YouTube) for Web publishing, or to InDesign, editorial systems or catalog production software for print production.

Then there’s scenarios where the DAM and other internal systems need to integrate data for search, reporting or other business processes: The DAM system may need product data from a PIM (Product Information Management) system to allow searching assets by products. A museum’s CMS (Collection Management System) or a company’s CRM (Customer Relationship Management) system may want to display images stored in the DAM.

Often, digital assets are automatically uploaded into the DAM system: Other systems use the DAM for archiving their content, or external content suppliers (like news and photo agencies) feed assets into the DAM for production usage.

In other cases, the DAM system connects to external services to outsource or enhance its functionality, calling a cloud service to transcode video files or generate preview images, or integrating image recognition “AI” products.

In each case, systems communicate with each other, performing one or several operations. While these operations can be complex and special, there are typical patterns. Let’s assign names to them to help us talk about interoperability in a specific and systematic way:

  • “Referencing an item”: A system stores a reference to an item maintained within another system. Example: A DAM system stores a product ID (which comes from a PIM system) along with an asset.
  • “Linking to an item”: A system’s UI displays a link to an item in another system’s UI. Example: A DAM system links to a Facebook post where a DAM-hosted image is used.
  • “Embedding a file”: A system displays a file by embedding a URL from another system which points to a media file, possibly with parameters for a tailor-made, dynamically generated file variant (see IIIF).
  • “Listing items”: A system retrieves a list of items from another system, along with a subset of item metadata. Example: A Web CMS fetches an RSS feed from a DAM, with asset metadata and thumbnail preview URLs.
  • “Searching items”: Like “listing items”, but with the ability to pass search terms, sort order etc. as parameters to request a tailor-made, dynamically generated list.
  • “Reading item data”: A system retrieves a single item’s data (including metadata and file references) from another system.
  • “Updating an item”: A system appends or replaces item data in another system. Example: A newspaper production system updates photo usage data within a DAM system.
  • “Creating an item”: A system creates a new item within another system. Example: A news agency system creates an asset within a DAM system by FTPing it a JPEG image file with embedded metadata.

The Software Developer’s Perspective

References, links, and file embeds might be possible without having to write integration code – see David Diamond’s Integrating DAM with CMS without an Integration. But in many cases, you’ll need help from a software developer to connect your systems.

The scenarios and operations described above should be possible to implement, assuming the intended process and data flow is specified, and the systems involved either provide an API (Application Programming Interface) out of the box, or can be extended to communicate with other systems.

But making everything work together smoothly is easier said than done. The developer has to figure out a lot of details. Here’s some of the technical aspects to consider when connecting software systems:

  • Which direction is it going to be: Will software on the side of the DAM system actively push data to another, “passive” system, or will the DAM system remain passive, with the other side pulling data from it? Communication could also take place in both directions, or a third party (like Zapier) could play the active part.
  • Is there a live, synchronous connection between both systems using an HTTP API, or will data be transferred by the means of files copied back and forth (possibly via FTP)?
  • Which syntax (XML, JSON) and data format (SOAP, RDF, RSS, NewsML, XHTML, JSON-LD) is used for data exchange?
  • Which data structures and field names (also known as the “schema”) are used? I.e., does the API or file format have the notion of an “image”, and is the image caption in a field named “CAPTION”, “Headline”, “title”, or “h1”?
  • How are assets and records identified – is there an “ID” or URL field, and is the identifier persistent and globally unique?

In fact, there’s dozens more questions that need to be answered before the software developer can craft a connection between any two systems.

The Cambrian Explosion Of DAM Integrations

The sad thing is that when the implementation is done, the developer will have to start almost from scratch when asked to integrate the same DAM with another system. Integrations often aren’t reusable. That’s because all DAM systems, and most systems the DAM needs to work with, have different answers to the questions listed above.

So each and every DAM vendor re-implements integrations with the same systems. Let’s say that a mainstream DAM product should work with Sitecore, WordPress, Clarifai, SharePoint, Salesforce, and YouTube. That’s just six different integrations. But there’s lots of DAM vendors out there. If just ten DAM vendors each connect their products to these six systems, that’s sixty custom-implemented integrations already!

(For example, check out these 9 Sitecore integrations of DAM products from ADAM, AssetBank, Canto, CELUM, DigiZuite, IntelligenceBank, Picturepark, Webdam, and Widen.)

Coding each integration from scratch is a terrible waste of time, money, and developer motivation. What we need is, in the words of Mike Amundsen, (generic) interoperability, not (point-to-point) integration.

The Role Of Current DAM Standards

That’s not to say there aren’t any standards in DAM: The DAM Directory lists quite a few of them. Why is interoperability still so hard?

One problem is that these are mostly metadata standards, focusing on the representation of a single digital asset’s metadata. API operations – how to search the DAM and get a list of assets back, and how to retrieve that metadata over the Internet – are out of scope for standards like IPTC Photo Metadata. Most DAM vendors support “IPTC metadata”, and still, each DAM API looks and behaves differently.

The traditional focus on metadata embedded within image files isn’t a good match for many typical use cases (though it’s still amazing to receive an image file which carries lots of useful metadata). Consider a DAM/WCMS integration: From within the Web CMS, the user searches the DAM, sees a list of assets with a thumbnail image and a few metadata fields, and then picks one from the list which is going to be served directly from the DAM system. None of these operations require, or benefit from, transferring (potentially large) image files with embedded metadata.

Current DAM standards also don’t cover all of the Ten Core Characteristics of a DAM – for example, workflows, collections and custom metadata fields aren’t well-standardized yet.

CMIS4DAM

In 2014, the OASIS standards body initiated the development of a new DAM standard called CMIS4DAM, responding to “the ongoing DAM interoperability crisis” (see Ralph Windsor’s Introduction to CMIS4DAM). Andreas Mockenhaupt described it as “something along the lines of a universal API, allowing those DAM systems that are compliant to easily gain access to the metadata that they need without performing integration work.” But judging from Ray Gauss II’s call for help a year ago, and the silence on the mailing list, work on CMIS4DAM seems to have ceased. (Disclaimer: The DAM vendor I work for considered joining the committee, but the high cost of an OASIS membership stopped us from participating.)

Considering CMIS4DAM’s lack of success so far, and the recent closing down of the DAM Foundation, it seems most vendors are not deeming it important to work together for the greater good of the DAM ecosystem. Our products’ interoperability shortcomings simply mirror this fact (see Conway’s law). To quote Ralph Windsor’s harsh critique from 2014: “The DAM industry is guilty of self-obsessed and narcissistic behavior or (at best) an apathetic and fatalistic attitude that assumes interoperability is someone else’s problem which might never get solved anyway.”

Semantic Web Technology As A Possible Solution

It’s not just that better DAM standards are unlikely to arrive soon: Even the best DAM specific standard would address only half the problem because interoperability is a two-way street, and “foreign” data needs to be exchanged as well. When connecting a DAM to a PIM system, would it help to have rivaling DAM interoperability and PIM interoperability standards?

Generic, standardized mechanisms for exchanging structured data, more helpful than “let’s use (any) XML” but less rigorous than an industry specific standard, could help us bridge diverse systems. That’s what Semantic Web technologies were meant to do: replicate the human-readable Web’s success for structured, machine-readable data by providing a generic language for structured data (RDF), using URIs/URLs as identifiers and links, and making data access as simple as visiting a URL the way a Web browser does.  For more details, see the DAM Guru webinar on DAM and the Semantic Web by Margaret Warren, Demian Hess and myself.

The term ”Linked Data“, which is often used in the Semantic Web context, highlights the special mindset required for Web-scale interoperability: You don’t start with integrating systems. Instead, you invest in data quality and interconnectedness, and publish that data on the Web – not through a custom-built API, but in standard formats. To illustrate the Linked Data approach, think of all the software that lets you add hyperlinks when writing text: You can link Web pages to Wiki pages to Google Docs documents to JIRA issue tracker tickets without ”integrating“ any systems because the links live in the HTML data. That’s possible with structured data, too.

While the initial vision of a Semantic Web full of autonomous software agents, doing our shopping and booking our flights, has not (yet?) been fully realized, the core standards and technology are stable and usable. They would provide clear answers to some of the developer’s question we looked at earlier: Assets are identified by URLs, data is accessed using HTTP connections, and RDF offers several standardized formats. And we could even connect asset descriptions in the DAM to public datasets like DBpedia using the same technology. To get an impression of how DAM functions can be mixed with Semantic Web concepts, take a look at my co-contributor’s ImageSnippets product.

The Schema.org Vocabulary

Choosing Semantic Web technology doesn’t answer the question which data structures and field names we could standardize on – there’s lots of Linked Data-compatible vocabularies, including IPTC and XMP (Adobe’s XMP is built on top of RDF). But let’s look at a particular one:

The Schema.org vocabulary is a Semantic Web success story. The large Web search engine vendors (Google, Bing, Yahoo, Yandex) are increasingly interested in indexing not just HTML text, but also structured data, so they can better search for and list structured information like product offers and recipes. These vendors agreed on a shared vocabulary (which is constantly being extended, in an open, W3C assisted community process), and on standardized ways to embed structured data in Web pages so the search engine crawlers could find it. The data model is fully conformant to RDF, and the Linked Data formats RDFa and JSON-LD can be used for embedding. (It’s pretty cool that the SEO guys who add Schema.org data to their Web sites unknowingly contribute to building the Semantic Web.)

Schema.org data on Web sites is often a dumbed-down version of the original data. While the vocabulary is pretty extensive, it doesn’t cover all the complexities of each industry’s data model. But it doesn’t have to: It just needs to be good enough for search and result display purposes. Which, coming back to our original topic, is exactly what we need for many DAM interoperability use cases (referencing, linking, reading, listing, searching items). A DAM is a search engine, after all (plus a few extra features, of course). In my opinion, this congruence makes Schema.org (and RDF or JSON-LD) a good starting point for Linked Data-based DAM interop.

The Schema.org vocabulary could be an answer to the question which data model and field names to use. We could work with ready-made standards and technology, be part of a wider ecosystem and contribute to a living, well-known vocabulary (which even has an extension mechanism for, say, DAM specific stuff like file variants and renditions).

If you want to know more, take a look at my Schema.org DAM experiments for photo metadata and search results.

Let’s Work This Out Together

I might be wrong about the suitability of Semantic Web technology for DAM interoperability. Maybe we should reboot the CMIS4DAM efforts, or use some other approach not mentioned here. Ralph Windsor’s 2013 article on The Building Blocks Of Digital Asset Management Interoperability provides a good overview of the options.

But I’m sure we can improve on the rather miserable state of DAM interoperability if we join forces. We’re all going to benefit. How are you willing to contribute? Let’s talk; I’m looking forward to your comments!

 

This feature article was contributed by Tim Strehle of Digital Collections with input by Margaret Warren of ImageSnippets and is part of our Improving DAM In 2017 series.

Share this Article:

6 Comments

  • Very good points, Tim, and I adore your enthusiasm on this! In my slightly more pessimistic opinion, technical protocols such as CMIS4DAM are no viable solution simply because they won’t be built, or broadly adopted.

    Possible reasons: First, DAM vendors are struggling for staying relevant in a market that is in competitive consolidation, if not dissolution into what I believe is a backend (headless) CMS or MOM/MRM category with their resources allocated to “survival”. Second, even proprietary integration frameworks are getting more mature with simpler API methods and a growing array of SDKs. Together with a booming category of “integration proxies” (Zapier, MS Flow, Oracle ODI-CS, Scribe, IFTTT, or even Dropbox when speaking of file-centric DAMS) it doesn’t require that much developer power anymore for getting an integration done. Third, while many integrations are very generic and could benefit standardization, a “custom” piece will often remain which doesn’t come at marginal costs anymore.

    That custom piece often is not just about technology though but more about the data model. What you said to be a starting point would actually be a breakthrough to me: A standardization on schema.org (data models, field names and field values/terms – together with the schema-org extensions) offered as a built-in feature-set by all DAM/CMS vendors so that integrators and customers can map their custom metadata schema to such common standard.

  • Johannes Schmidt

    Tim, you are referring to schema.org’s extension mechanics (https://schema.org/docs/extension.html). There is another, very elegant way to extend schema.org vocabulary to e.g. satisfy “DAM specific stuff”: JSON-LD supports multiple contexts, i.e. multiple extensions: http://json-ld.org/spec/ED/json-ld-syntax/20120122/#external-contexts
    This enables authors even to declare their custom contexts to meet their very specific needs (however, I encourage authors to exhaust existing vocabulary).
    Also have a look at http://json-ld.org/spec/ED/json-ld-syntax/20120122/#prefixes.

    Cheers,
    Johannes

  • Ramon, Johannes, thanks a lot for the comments!

    I agree that a hypothetical new DAM interop standard has little chance of getting adopted. I’m hoping for something so simple (think RSS, or Open Graph) and valuable (if some first mover open sources something useful that builds on it, like a WordPress or Zapier integration) that it’s attractive even to a vendor in “survival mode”. Naive or doable? I’m not sure, to be honest.

    But schema.org for DAM is indeed very interesting. My first experiment from December 2015 looked like this: https://www.strehle.de/tim/weblog/archives/2015/12/04/1577 What next steps would suggest to take? Collecting examples? Finding out what’s missing in schema.org, and how to extend it (thanks for the pointers, Johannes)? Mapping existing DAM metadata standards to schema.org? It would be great to think this through with you and whoever is interested and has the time.

    Thanks,
    Tim

  • I don’t want to over-promise because we’re very busy on the new product, Tim, but let me try to drum some folks together during Q2 for sharing ideas. I’d also like you (and other interested parties) give us some feedback to our new PCP once we can tell more.

    Again, thanks for you initiative – people like you will make the difference!

  • I think our company has unique insight into this as we have inherited DAM connectivity to the Adobe Creative Cloud from Adobe. We have now Connected Adobe InDesign to 20 DAMs, and the list keeps growing. Our Silicon Connector could be something of a hub, maybe, to some minimal extent. But few DAMs seem to want this.

    We have found that not only are DAMs generally non-interoperable, but that this is often the case by design, motivated by some combination of competitive strategy and/or a lack of resources. We’ve seen almost no interest in interop or standards from DAM vendors. We support two DAMs that did support CMIS (Alfresco and DALIM) and our development was made much easier because of this. CMIS was not bad, it was simply not adopted by enough vendors for it to make a dent.

    The Semantic Web is also a failure, probably a worse one as the stakes were higher. Anybody who thinks Schema.org is some sort of success does not know the history or vision of semantic markup from the days of Charles Goldfarb, Tim Berners-Lee, Jon Bosak, etc. Out in the wild, the implementation of semantic markup by Google is overly hard-coded and revenue-driven: you can have any sort of semantics you like, as long as it’s a Recipe or a Movie Review.

    I honestly don’t have much hope. The only real “interop” that we see is the low-level: everyone has some sort of API and some sort of Web Services interface. You can talk between the systems, which is beautiful. XML/JSON succeeded at a low level only. The higher-level constructs are beyond the reach of us poor mortals who are scared of each other and driven by short-term financial incentive.

  • Johannes Schmidt

    Max,

    > Anybody who thinks Schema.org is some sort of success does not know the history or vision of semantic markup
    I’ve been active member of the Topic Maps community and led some community projects. Topic Maps has a great vision and a long tradition (coming all the long way from SGML). However, Topic Maps is dead. There was no adoption primarily in business. What about RDF? I don’t see traction…

    Schema.org – in contrast – has adoption. So, what is more valuable (for whom, why)? A full blown semantic metadata standard without adoption or a leightweight vocabulary with quite adoption providing a basic hook to the semantic web (e.g. by providing URIs to identify terms)? I’m a traveller in both worlds and each has it’s merits. Voting for schema.org is more or less painful but is still better than being dead.

    Room for a long discussion….

    Cheers,
    Johannes

Leave a Reply

Your email address will not be published. Required fields are marked *