Making It Big In DAM
This feature article by DAM News editor, Ralph Windsor first appeared on DAM Coalition in July 2013.
Scaling Enterprise Digital Asset Management Implementations
In this article, I am going to analyse the various pressures on DAM implementations that can create a need for scalable systems and attempt to offer some practical advice that you can apply to your own circumstances.
Measures Of Scalability In DAM
One of the problems of dealing with scalability in DAM is identifying all the different measures of it in advance so you can devise strategies for dealing with them. There are more than you might think and they are not just technical, but cross over into related fields such as findability, metadata/taxonomy or usability.
Below is a non-exhaustive list of scalability pressure points that I will cover in this article:
- Proxies & Derivative Asset Generation
- Authentication & Security
- Taxonomy & Metadata
- Asset Growth & Findability
At the end of this piece, I will list some other factors and general advice on locating potential scalability issues that you may encounter.
The first part of the ingestion process is getting the digital files into the DAM system – or at the very least associating a digital file to an asset record. This process usually happens sequentially and occurs directly after upload.
In DAM systems that are not designed with scalability in mind, files will get uploaded to the system and assets created in a single operation. At lower levels, this works fine. Where it becomes problematic is if you have either tens of thousands of simultaneous users and need to receive their files and process them separately or if you are drawing them in bulk from some external source and delays occur while the system tries to process all the candidate files in sequential order. To implement scalable DAM systems, you need to be able to separate out the different asset import processes involved so they can happen independently of each other.
One of the common themes with scalability which I shall discuss below in the architecture section is this ability to separate out the constituent processes with DAM so you can selectively upgrade capacity or possibly customise and adapt them.
The second major stage of the ingestion process is cataloguing assets. Much depends on your process for applying metadata. If this is manual or carried out in the DAM system itself then you will need to pay special attention to the batch tools available to carry that out and a range of other features such as importing data from external sources.
As with uploading, most conventional DAM systems will handle this without special scalability provisions, it is when you get beyond the typical use cases that this most can becomes an issue. Below are some batch metadata modification features than can help with large scale cataloguing tasks:
- Find and replace within free text fields*
- Batch modify groups of assets (specific fields or ranges thereof)
- Batch catalogue using an existing asset (or one of the set)
- Macros or scripts to carry out repeated transactions on assets
* Metadata in Controlled Vocabularies etc should be easy to update centrally – this is true of the vast majority of modern DAM systems now.
The ability to revert bulk changes made is another consideration. Enterprise DAM systems should offer version control for everything (both files and metadata) and revert options to go backwards to a given point.
It is possible to make use of embedded metadata to externalise the cataloguing work and avoid bottlenecks during peak periods, but you need to ensure your asset suppliers are briefed to use it and that your DAM system can extract it at high volumes. Each incremental processing step you add will increase the duration required to get files imported and that can have knock-on effects. One further point with relying on externally sourced metadata is that someone still need to carry out spot QA checks to make sure it is non-contentious and safe to use.
I have never seen great results from fully automated metadata cataloguing but if you have a repository running into millions of assets that need to be available in short time frames then you might need to trade off the lower quality results with the ability to get through it all. Sub-dividing assets into different catalogues can help here (with higher quality manual methods reserved for the assets that will benefit from it). Integration (discussed below) is likely to come into play with large scale cataloguing also as that will often be a source of at least a proportion of the metadata.
To scale with minimal impact on existing users, you need to be able to introduce additional storage locations without the necessity to transfer all existing assets to a new consolidated facility. So you should be able to just bolt on new storage and the DAM will start using it in addition to what you already have. At key points you may well need to consolidate everything to reduce the management overhead of having numerous storage locations, but that should be an optional choice which you can plan for at your discretion, not something forced on you by the limitations of the vendor’s product.
As well as the assets themselves, there are also the proxy files (previews). With video especially even these can have quite large file sizes. More advanced storage scalability options which a number of DAM systems have include features to scale out to use Cloud or other external storage providers. These ‘cloud bursting’ techniques can be useful ways to scale up capacity, but the flip side is you also need to manage their availability and possibly add redundancy so that the assets can always be accessed even if one provider fails.
For some production oriented DAM systems it is easier to keep the file in its original location where it was created (especially if it will continue to be worked on in-situ) but if you do that you also need to monitor whether the referenced file has been moved and send out alerts to administrators so they can chase this up. Some vendors employ ‘shadow’ files where DAM system copies are made to reduce the impact of that problem. That can work, but this needs to be monitored to check they are operating as expected and also the frequency with which they are made (and if the system can even access the file at certain times when it is being used). In addition, as should be obvious, this also increases overall storage utilisation as you have to have at least two copies.
Proxies & Derivative Asset Generation
Proxy files are lower resolution copies of the original asset and are used to provide a representation of assets that users can work with prior to download. An example is the thumbnails you often see in DAM system search results.
Proxy and derivative asset generation operations are relatively intensive compared to other DAM system activity. At low levels, the effects will barely be noticeable. When you are uploading many assets, however, this can introduce significant bottlenecks.
There are a number of scalability techniques for dealing with this. One is to separate the proxy/derivative generation process from the rest of the application. This allows users to carry on working with the asset (including downloading it) even though the proxy might not be available. An asynchronous rather than sequential process is preferable for proxy generation to avoid holding staff up while proxies are rendered. Another complimentary method is to have a separate processing server facility which is dedicated to doing nothing more than rendering proxies and derivative assets.
A common method for delivering scalable applications is to use a DAM system that employs what is called a ‘Service Oriented Architecture’ (SOA). This means dividing all the core functions of the DAM into separate but interconnected services that can be called independently. The benefit is that you can scale up one part of the DAM system without impacting the rest of it. Not every vendor who offers this architecture will describe their system using SOA terms and there are also some who claim to have it for PR reasons but who may not have implemented it as fully as it should be. For those reasons, it is important to not just take their SOA claims at face value, but also have the vendor describe how their scalable architecture works in practice. If you don’t know anything about this subject but you think your DAM requirements will be large-scale, you are recommended to draft in someone else who does.
Authentication & Security
The extent to which authentication is a scalability problem depends on usage context. If the DAM system is for a large enterprise with numerous global offices, authentication might become more of an issue.
In large companies, there might be multiple authentication systems in operation simultaneously across different regions. There is a reasonable possibility that they will not all be integrated with each other. If the enterprise has been acquiring new subsidiaries recently then this is especially commonplace. Those companies with offices in highly regulated jurisdictions where internet traffic is closely controlled (e.g. China) will also face this challenge.
In many DAM initiatives, the heavier users can often be external suppliers such as agencies who require images to generate artwork and may upload assets too. They need access that is not dependent on them having workstation logins like regular staff users (so potentially a native authentication method built into the DAM itself). For on-premise DAM solutions this can present a problem and the IT department’s usual preferred approach is to get these users to go through VPNs or whatever in-house staff use when they need to access corporate network from home etc. While that might be simpler for them to manage, it will pose more access problems for your external users, who will need to install special software on their PCs (and remember that most agencies are typically staffed by Mac users). The result is that you will almost certainly see reduced usage of your DAM system and further productivity impairment or higher risk security practices like agencies asking staff to email assets or use unauthorised web based file distribution services.
There are various other techniques which are emerging to address this, such as protocols like SAML (Security Assertion Markup Language) where it is possible to use a hybrid of externally hosted systems mixed with internal authentication. To ensure scalability, Your vendor’s product needs to be compliant with these.
Scaling up taxonomies and controlled vocabularies can have various impacts on DAM systems which need to be considered. Some of these are related to the integration topic (see next section). Others are more related to growth in the range of values.
If the taxonomy for a DAM system is designed from scratch, it is usually slimmer and less bloated than if it has been migrated from a legacy or third party system. Even so, where many users have the ability to add new terms, the taxonomy can grow quite rapidly.
Enterprise DAM systems often need to use taxonomies from multiple sources. These might overlap each other with values from some being partially represented by another. Theoretically, it is possible to rationalise these sources and then import them. That is often impractical, however, because the source taxonomy is in a constant state of flux and will be continuously re-imported.
Many DAM systems will contain some features to import taxonomies from third party sources and you can usually automate this, however, applying further rules to them post-import (and re-applying them with each load) can turn into a bespoke implementation exercise. The built-in facilities of the product might need to be combined with some specific rules to ensure the taxonomy stays sensible.
Once the business has its asses in order and users are starting to be able to search and find assets more easily, there will be demand to leverage that as much as possible and put it to use in conjunction with a variety of other solutions. The DAM will also consume data and media files from a variety of different sources to expand the range of value that it can offer to further groups of end users.
You are unlikely to be able to predict up-front all of the other systems that the DAM will need to integrated with over its life time. However, there should be some likely candidates which can be identified in advance followed by standards/compliance for unspecified future integration needs.
As the DAM sector matures, interoperability is becoming a much-discussed topic as managers begin to appreciate the potential ROI obtainable from integrated DAM solutions. That productivity gain, however, can come at a price in terms of complexity and management overhead. A lot of integration work is, of necessity, bespoke, because the various systems that need to talk to the DAM are likely to have never been integrated and originate from separate vertical software markets.
There are various interoperability standards in existence right now which could start to make this task easier, for example CMIS (Content Management Interoperability Services). Not all vendors support it, especially among pure-play DAM providers. In ECM, CMIS is more prevalent, but the DAM capabilities of ECM suites tend to be less well developed. As the DAM software market accelerates in terms of functional sophistication it could be more difficult for generalist ECM solutions with DAM components to continue to keep up.
It is not possible to offer a generic recommendation about whether it is better to use a CMIS-compliant ECM and try to improve its DAM related functionality or choose a specialist DAM system and build custom integration features around it. The implementation order is probably more significant. If you have an existing DAM (or have found a product that can answer more of your needs) then the second route might be preferable but a lot depends on the specific circumstances of each implemenetation.
Exporting digital assets at an industrial scale will require system features to be combined together, including reporting to generate the required data, the generation of derivative assets (e.g. if you are batch converting media for use somewhere else) and assembling all the files from a variety of storage locations. If photos or videos are to be exported, another important consideration is ensuring they have embedded metadata contained within them. So called ‘orphan works’ legislation is now law in the UK and probably will be in Europe and the USA soon too. If your media does not contain copyright data, it can be possible for third parties to use it without permission unless there is another clear and visible indication that it belongs to you. On that same subject, many organisations will distribute key assets to Social Media channels, usually they will want them watermarked to make it harder for unauthorised use to occur.
Typically, export-related tasks will not be one-off jobs, but have to be scheduled or callable via some other process without engineers needing to come in and set it all up manually. Therefore, scripting and/or API features to allow the DAM system to be automated are fairly essential to do this flexibly and at larger volumes.
Asset Growth & Findability
The nature of DAM systems and the issues users encounter with them changes depending on the number of assets contained. To start with, the problem will be inadequate volume and complaints about not enough search results being returned. As the system gathers momentum, a tipping point will be reached and the problem becomes an excess of results.
The impact on findability with high volumes of assets often does not get adequately tested. Sometimes migration work is carried out after acceptance testing (which should never be permitted, in my opinion). It is essential to consider what will happen to the system once more assets get added. This should be part of the sign-off criteria. If you are not migrating existing assets then use dummy ones if you have to. You have to be able to see what will happen when you add large volumes of new assets into the system and whether the findability (or usability) starts to creak or not.
Other Scalability Factors
Below are some other factors not discussed here which you will want to include in your scalability planning:
- Servers/Hosting (Redundancy and Load Balancing)
- Data Safety (Backups and Security)
- Asset Usage Approval and Rights Management
- Asset Approval (Upload/Cataloguing)
- Multi-Lingual Metadata and Asset Localisation
- Data Migration
- Change Management
The two biggest impacts on scalability are probably asset and user volumes, so from that you can identify many associated scalability considerations. What is more complex (but still essential) is to find contingent issues that only appear because of scaling some other dependent element. For example, the effect of adding tens of thousands of restricted access assets is an increase in the workload for staff assigned to verify each request.
DAM scalability is a multi-faceted subject that can generate numerous challenges for those at the sharp end of the delivery process. Many readers might find that although they do not consider their DAM to be ‘large scale’, there are individual aspects where a scalable approach is essential to meet requirements.
When I am involved in project management for DAM implementations, a technique I find useful is to get an overall understanding of the digital asset supply chain as it relates to the client’s requirement. This can be carried out using diagrams, written in text, or both, but you need to grasp where and how assets will get used. This means identifying all the different points where they are either introduced into the system, exit it again and all the destinations in between. Some also refer to this as the ‘asset ‘lifecycle’ and providing it covers similar ground, that is an equally valid description.
Managing large-scale DAM implementations is not recommended for those who are new to DAM due to the manner in which one poor decision can have knock-on effects elsewhere. With that being said, even those who have experience of this kind of project must be able to clearly explain themselves and at a level of detail that is appropriate to each group of stakeholders involved.