Do you own your DAM Data?

This feature aticle was kindly contributed by Peter Krogh, Chief Product Officer at Mediagraph and author of The DAM Book.

Twenty years ago, I wrote in the first version of The DAM Book that it’s essential to have a “pre-nup” with your DAM system. In addition to the files themselves, there is a tremendous amount of value in the metadata. You will want to take it all with you if and when you move to other software.

In the 20 years since my first book, portability of data has improved. But it is not a solved problem. In my current position as a DAM product manager, I have gotten plenty of experience with data migration. And I see much room for improvement in the industry.

Sometimes the gap is caused by metadata in one program that has no equivalent in other programs.
Sometimes the issue appears to be incomplete engineering in either metadata export or API capabilities.
We are also seeing intentional roadblocks from DAM vendors to prevent customers from moving on. The most egregious of these is refusal to let customers use the API for exit migration.

The last two items outlined above are the most concerning. They indicate that some DAM applications are taking a “walled garden” approach to lock in customers. Building a DAM that does not have proper data export should not be acceptable. And refusal to provide an API for exfiltration is even worse, as it is a blatant violation of trust.

In order to protect your information, you need to make it portable. That means understanding how portability can be accomplished. It also means asking questions in the contracting process, testing, and verifying that the answers are accurate. A good consultant should be able to do this for you. But if you are selecting the DAM without that kind of help, you’ll need to dig into it yourself.

In this blog post, I will outline the various methods of data export, and discuss the advantages and disadvantages of each. You can use this knowledge to verify that a valid “pre-nup” does exist, and avoid unpleasant surprises in the future.

How is data made portable?

Metadata can take the classic form of who, what, when, where and why. It also includes usage and collection metadata such as number of downloads, date of original upload, name of person uploading, an item’s relationship to other items in the collection, and other transactional records.

Let’s start by outlining the primary methods for moving your files and metadata from one system to another. I’ll start by addressing the files themselves. After that, we will look at the three methods for making metadata and other data portable.

File download

It should be easy for you to download all the files in your DAM. There are only a few things to watch out for here, based on the structure of your DAM.

Folderless applications – we are seeing a trend in the industry for “folderless” DAMs. These DAMs do not use traditional folders to organize files at all. Instead, they depend entirely on metadata search to find files. When you download files from these DAMs, they may come in with no subfolders. This may be true even if the files were originally submitted for upload in a folder structure. It may be difficult or impossible to recreate the original folder structure, should you desire to do so.
Duplicate Files – There are many DAMs that feature containers that look a lot like folders, but have one critical difference. Instead of behaving like folders (a file may exist in one and only one container), they have “collections”, “galleries”, or “albums”. In these instances, a file may belong to more than one container. When you download from these DAMs, you may end up with a new copy of the file for each collection it belongs to. This can create a big mess of duplicates. Resolving this duplication typically means you will need some way to remove the duplicates, while preserving the fact that the file also belongs to more than one container. (Note that the curation of your files into collections or albums is one of the highest values you can add to your media collection, so it’s important to preserve.)

Each of the issues above can create a real headache at the time of migration. If you want to recreate the folder, collection, or album-based organization, you may be facing a lot of extra time and expense in the process. In some cases, it may simply not be possible to recreate.

And as part of a file download, you will probably also need to do some additional work to bring in as much metadata as you feel is useful. In the next two sections, let’s look at the ways you can migrate some or all of your metadata.

Embedded Metadata

Metadata can be embedded in many different file types. The most complete and mature metadata embedding was designed by the IPTC organization for image files. It includes a very robust number of fields that can describe many content and ownership details. Adobe helped to extend IPTC using the XMP notation format, which allows a nearly limitless amount of metadata to be embedded in an image or PDF file. Other document formats, like Microsoft Office files also support embedded XMP metadata, but generally limit it to a handful of fields like title, author, subject and keywords.

When downloading from your DAM, the files should contain any updates to metadata that were made to the file. (Some DAMs will allow you to choose original metadata or current metadata). It can be simple to then add the files to the new DAM and have all the metadata extracted.

There are a couple issues to look out for if you are depending on embedded metadata for the information migration. This should be part of your testing during the initial selection process. And it should also be tested as you consider a migration.

Does the new DAM support the fields my metadata has been written to? While many DAMs say “we support IPTC”, few DAMs support all IPTC fields. You’ll want to make sure the ones you are using for migration are supported.
What exactly does “support” mean? Full support should include reading, writing, displaying, indexing and searching the fields. Some fields may be partially supported on a read-only basis, some may show the data for a field, but not allow the field to be searched. The highest level of support will often show you the data for all files in your account in a filterable fashion as shown in the figure.
What about the information that does not fit into standard metadata fields? Many DAMs offer custom metadata, and in some cases, this data can be embedded into the files. Whether this custom metadata can be read is a separate question. Non-standard fields are, after, not standard.
There is other information that does not really fit into the files themselves. For instance, information about collections, galleries and albums can be hard to embed, especially if you want to also preserve information about the collection itself. This can include who made the collection, a description of the collection, and a file’s position in the sequence of the collection.

The good news is that most of the questions above can be answered with some simple testing. You can use sample files like the ones provided by the IPTC. You should also test with files downloaded from your current DAM. Load them into any system you are considering as a replacement.

Data Export

When data can’t be embedded for any reason, the next process to look at is some type of data export from the current system. Most DAMs should have the capability to export metadata in one of the common formats – CSV or Excel documents, JSON or XML. This export, in turn, should be importable into the new DAM and be associated with the files through data merging.

Example CSV Format

Example XML Format

Example JSON Format

Data export can be the best method to make certain information portable.

CSV and Excel are probably the most useful types of data export. This is because so many people know how to read and manipulate the data with Excel, Apple Numbers or other programs. You can easily check on the completeness and accuracy of the exported data. As with embedded metadata above, there is a variety of support in data export.

Check for completeness – The most common issue with data export is the lack of comprehensive information. The transactional information (who uploaded, downloaded and when) may be missing. It’s also common for metadata about the metadata to be missing. (e.g. who created this keyword, what is the description of this collections, etc.).
You’ll want to look around the existing DAM and see how you use this type of information. For instance, you may frequently want to sort files by most recent upload. If the upload date is missing in the data export, you won’t be able to do this for the exported files.

API Export

The king of DAM exit strategies is the API export. Application Program Interfaces (APIs) are the internal plumbing of your DAM. In modern true “API first” applications, every bit of information can be accessed through the API. Using the API, files and metadata can be transferred directly from one application to another. It may be possible to replicate the entire collection structure of the old DAM to the new one, including all metadata, and even the metadata about metadata. And it is usually a lot easier for the client.

You can think of full API export as the ironclad pre-nup for all your files and all the work you have done to them. But many applications do not support full API export. This is due to a number of factors.

Older, non-API-first applications – Older software architecture may have been built without the use of an API. In these cases, when the API is added later, it may only be able to access a small subset of data. Perhaps it was built for a particular integration that only needed to display the files in another program.
Incomplete API engineering – Even some modern DAMs that claim to be API-first may have very large holes in their API capabilities. We have seen this a lot recently. Data that is almost certainly being delivered by the API internally, is simply not available to external parties. This could be due to lack of resources to finish the API documentation. But in many cases that is implausible given the size and revenue of the company. It could also be a roadblock left in place intentionally to prevent users from taking their data with them. I don’t like this interpretation, but sometimes that’s the best explanation.
No place to put the data – It’s also very common for the data in one DAM to have no equivalent in the new DAM. Mediagraph has a robust hierarchical taxonomy that allows for intuitive content discovery. However, most DAMs don’t have this capability. So while the data can be exported, in most DAMs there is simply no place to put the information.

Intentional API Blocking for export

The most worrisome trend we are seeing some companies refuse to allow their API to be used for extraction of files and data. API access requires the use of an API “key”. Some companies like Mediagraph make access to the API key entirely self-service in the user’s profile. Other companies only provide the key upon request. In one case, API keys which were freely provided for migration a few months ago are no longer available. Clients are now forced to migrate using the expensive, time-consuming and incomplete methods outlined above.

What can you do to protect your data?

As outlined above, the threats to data portability are multi-faceted. The profile of your data portability only shows up with investigation and testing. Unfortunately, it’s likely that the issues are only discovered when you end the relationship with the current DAM. Fewer than one in 10 of our prospective clients drill into this at the time of purchasing. When they do, we can demonstrate that we provide all the methods outlined above on a self-serve basis

The DAM industry needs to do better. And that will probably only happen with client pressure. I’d like to see more clients require migration tools at the time of account inception. And when companies make a change that harms that migration capability, clients should band together and demand a change.

Epilogue

A number of years ago I worked for a company that does heritage-based storytelling and corporate archive management. They were losing a very prominent client – a world renowned brand was taking their material back, because management had finally understood the value of the material and had allocated funds for in-house archives. Some people at the company considered this a failure, but my boss at the time chose to herald it as a success. “We took the at-risk archives, and we preserved and curated it. And now it goes back to them because the company can see the value. This is the very definition of a successful service to the client.”

In all of my writing and consulting and product development, I have never lost sight of the importance of the pre-nup. We will only be temporary custodians of the material. It’s up to us to navigate the technical and business hurdles that can get between a client and their rightful material.

About Peter Krogh

You can connect with Peter via his LinkedIn profile.

Share this Article: