Metadata and photo management expert, David Riecks from ControlledVocabulary.com has written a highly informative article that explains how to de-dupe an image collection using MD5 checksum tools and a spreadsheet. The process will also produce a rudimentary asset audit that will identify the file locations, duplicate status and file name. The explanations are clearly presented with screenshots and differences between mac and PC platforms.
For anyone embarking on a DAM migration project who has to audit a shared network drive used by multiple staff, this could offer a useful starting point before a vendor or other more in-depth IT expertise is required.
“The method outlined in this article shows you how to evaluate checksums or hashes to “deduplicate” your collection. Data Deduplication involves scouring your hard drive / server, etc. for redundant instances of files and selectively deleting them. While the text which follows makes reference to image files, the same process can be used for any file type on a computer. If you are managing a large number of image files that should not be modified (like proprietary Raw files), then you will definitely find the following information beneficial.” [Read More]