Using Checksums To De-Duplicate Image Collections


Metadata and photo management expert, David Riecks from ControlledVocabulary.com has written a highly informative article that explains how to de-dupe an image collection using MD5 checksum tools and a spreadsheet.  The process will also produce a rudimentary asset audit that will identify the file locations, duplicate status and file name.  The explanations are clearly presented with screenshots and differences between mac and PC platforms.

For anyone embarking on a DAM migration project who has to audit a shared network drive used by multiple staff, this could offer a useful starting point before a vendor or other more in-depth IT expertise is required.

The method outlined in this article shows you how to evaluate checksums or hashes to “deduplicate” your collection. Data Deduplication involves scouring your hard drive / server, etc. for redundant instances of files and selectively deleting them. While the text which follows makes reference to image files, the same process can be used for any file type on a computer. If you are managing a large number of image files that should not be modified (like proprietary Raw files), then you will definitely find the following information beneficial.” [Read More]

Share this Article:

2 Comments

  • Good write-up and the original article is very interesting. De-duping is definitely a part of the process for organizing your image/media library, but wouldn’t it be better to allow the DAM product to do this for you as a part of the initial ingest of assets into the system? North Plains’ product, Telescope can be configured to use MD5 checksums as the basis for duplicate file handling capabilities in the application, and I’m sure other DAM solutions offer this feature as well. Doing this sort of thing manually before moving on to select a DAM seems like wasted effort in this case, no?

    Thanks!
    Steve.

  • Steve,

    As you have acknowledged, many other DAM systems do this also in addition to Telescope. I think the point of the original article is for those who don’t have access to a full DAM system (e.g. due to cost reasons).

Leave a Reply

Your email address will not be published. Required fields are marked *