Document Assets In Digital Asset Management Solutions
This feature article has been written by DAM News Editor, Ralph Windsor.
The majority of assets held in DAM systems are usually images or videos since that was historically the type of data they were purchased to help enterprises to manage. Most solutions will allow you to upload more or less any type of file, however and also include dedicated features to support documents, such as indexing the text content and generating thumbnails or other representations of them.
The range of document support in many DAM systems tends to be oriented towards marketing use-cases, like PDFs, InDesign files etc, but more conventional Office formats like PowerPoint or Word files are also relatively common in most systems these days. In the case of production/workflow DAM solutions used to store and coordinate work-in-progress, while the documents might not be ever be actually used, they still need to be searchable, especially where narrative copy or transcripts need to be accessed as component assets. Further, their role appears to be increasing in DAM (relative to othes) and it is far greater now than it was 10-15 years ago when DAM systems got referred to using descriptions like ‘The Photo Library’.
Representation Of Documents And Relationships With Assets
As alluded to earlier, the role of document assets in digital asset repositories is varied. Even though images will usually still be the most popular type of asset, marketing-oriented DAM systems in particular are likely to contain quite a significant number of document assets. This seems to be the case because the DAM is already used to hold lots of other marketing-oriented content and so it becomes the obvious choice.
The different methods by which documents get treated in DAM systems are not necessarily quite as straightforward as making each document a separate asset record (although as I will discuss later, that certainly can be the case). In some systems, it is possible to associate a subsidiary file with an asset. For instance, a logo asset might also carry a document with instructions on how to use it. This can get called various names, for example generic terms like ‘attachment’ or in some brand-oriented DAMs, ‘guidelines’ is another description (there are various others you may hear). Essentially, the document is a file only and most of the metadata not present in the text of the document is derived from the master asset record. The ‘attachment’ example hints at the nature of the relationship. The asset record can be considered equivalent to an email and the attachment file accompanies it (and requires the original message to provide both the context and interface controls to access it).
The other way to approach the problem of delivering documents is to set up dedicated asset records for them. If explicit relationships between document assets and other types are needed, the existing linking facilities which the system provides can be employed. In this scenario, the document can have independent metadata and potentially be associated with many different records.
The best method to use is a subject for some debate. Associated files are usually fairly quick and easy to set up, whoever is cataloguing the record merely has to upload a file. The downside is that this process might need to be done numerous times if the documents are shared across many assets (including changing all of them later if the document gets modified). By contrast, individual assets have the benefit of being autonomous from a single asset so they are easier to modify and manipulate.
In general, the independent asset approach is probably preferable (certainly from a longer-term perspective) because the opportunities to manipulate the associated document asset independently from its master asset records become far easier and there is scope for custom metadata relationships between assets that use the existing metadata management features of the DAM. Associated data like attached documents are a hybrid entity and therefore they can add complexity to future migration exercises. Although you might not be interested in migrating assets to other DAM systems now, you can virtually guarantee that at some point in the future you (or your successors) certainly will be.
With that said, if documents tend to be associated with single assets and setting them up as dedicated assets involves a lot more cataloguing work, it might not always be the optimum technique. One approach I have seen in some systems is to be able to promote or demote associated files as assets, i.e. break them away so they convert into linked equivalents. This offers the best of both worlds, but these features are heading towards the more esoteric end of the DAM functionality spectrum and therefore more likely to incur customisation and professional services fees from your vendor if they do not already exist.
What these discussions point towards is having a more in-depth understanding of how the semantic relationships between your assets will be both implemented and maintained over the longer term. Document assets (especially marketing oriented ones) tend to be composites of several different assets, for example a brochure will contain multiple photos, graphics and possibly raw text. In the case of InDesign, the file format itself might store many of these, but depending on that exclusively as the sole repository of metadata about relationships between files could become restrictive (and risks losing that information if the file becomes corrupted or lost). As well as taxonomies and other metadata models, it is almost inevitable that you will need to create further metadata schemas in the future that no one thought of when the DAM system was first commissioned. It is reasonable to use some arbitrary methods to extend these to start with so assets are made available as quickly as possible, but at some stage you will need to consider how (and where) to fit them into a wider metadata strategy. All of which segues neatly into the next point.
Findability And Metadata
Like so many other DAM-related considerations, the role of documents in findability and metadata discussions is something of a double-edged sword with trade-offs on either side. The benefit of document assets is they usually already contain some text which can be indexed. Theoretically this reduces the cataloguing effort required because it is possible to make use of the text that is contained within the document itself. The snag with this assumption is that documents may also contain other less relevant text phrases which can skew the index and produce some unexpected (and usually undesirable) results.
I recently worked with a client who uploaded all their PDF brochures as assets into their DAM. On the back cover of every single document, they listed all their worldwide offices because they were eager to demonstrate their global reach to prospective customers. The DAM system they were using was configured to index documents and treat the text in the same way as other metadata which had been entered specifically for metadata cataloguing purposes. The result was that keyword searches for countries returned all of these document assets, whether they were relevant or not. This considerably impaired findability because the users were having to filter the results using more advanced search criteria (or plough through pages of them – as most users resorted to).
There are some methods to tweak relevance criteria with search technologies that can help with this and eventually you would expect these problems to diminish as text search components become more discerning. From what I have seen right now, however, the best approach with DAM solutions that will contain many assets types (i.e. more than just documents) it is to disable the full text index that includes document text by default and offer the user the option to switch it back on again if they know they need to go down to that level of detail. I would always recommend keeping the indexed text and having it on-line (i.e. potentially usable on-demand) rather than just archived, even if you do not plan to allow it to be searchable (via search options or any other means). As has been discussed before on DAM News, all asset metadata has a potential present or future value, even if you haven’t worked out what it is yet.
In contrast to the scenario described earlier, I have another client where the majority of their assets are highly stylised documents like resumes of practitioners with photos etc, lengthy proposal documents and some regulatory and commercial materials that they have to include with all of the tenders they pitch for. These constitute 90% of the material they hold in their DAM and other items like logos or promotional videos or templates are less important from a search perspective. To solve that issue, the ‘display’ assets with a visual focus are grouped into collections or packs, most of which can be retrieved from the home page of the DAM via pre-defined links their marketing personnel have already set up. This way, the issue of the brand assets getting lost in thousands of search results is diminished. Users also have an option to disable full text search if they wish (i.e. it will only search the explicitly catalogued metadata if specifically asked to).
I have seen various hybrid methods employed in other solutions where assets are grouped into different collections and search options set accordingly. To the end user, these appear like separate repositories although internally they all use the same core facility. This is a relatively efficient and flexible way to deal with this problem, although it will increase the complexity (and probably the cost) of establishing a DAM that will handle all of your different user requirements in respect of document assets.
As should be obvious from the above, it is difficult to derive hard and fast rules about how to treat documents in DAM systems without carefully examining all of the use cases – why people will want to use your DAM in simpler terms. If you plan to introduce a lot of document assets to your DAM solution, it’s worth doing some tests with larger sample sizes on a staging server and just observing the results before you press on and import hundreds of thousands of them.
This short overview should give you a few pointers to think about when considering document assets. While it is common to generalise about the treatment of assets of different types (and vendors are always eager to highlight the range of types they support) the characteristics of each have some implications for the decisions you take about how you implement your DAM and maintain its relevance and usefulness.