dupdelete
Dupdelete is a program for aiding in tidying collections of files.
Dupdelete has two fundamental modes of operation:
- Recursive MD5 hashing of a directory and its contents, writing the hashes to a file.<br>
- Recursive MD5 hashing of a directory and its contents, logging and optionally deleting anything matching a blacklist of hashes.</p>
It can also cache the hashes found on previous passes into a GDBM file, allowing for very large sets of files to be processed on a regular basis without calculating hashes from multiple terabytes of input. It is entirely practical to have it, for example, process the user areas of over a thousand students at a school and compare every newly-found or modified file against a blacklist containing the MD5s of downloadable games, infringing music, offensive jokes and inappropriate images. This is the purpose to which I used the software, and I find it very satisfactory (and satisfying) in that role.
Regretably the switch to chromebooks put an end to dupdelete's reign of killjoy terror.
As an example, dupdelete can be used to hash a folder full of copyright-infringing music and then to search user folders for any undiscovered copies:
Or it can be used to validate copying of a large directory structure to ensure no files were corrupted in transit:
Though only capable of using the md5 hash - no longer cryptographically secure - it can apply a salt to this hash, and can also be set to skip sizes greater than a defined size.
Dupdelete is released under the GPL, though on the understanding that this is 'good enough' software and not the best-written. Windows executable and source are released, but this is a simple and cross-platform program so you should have no trouble compiling it on anything POSIX. It compiles for windows using mingw. The Visual Studio C compiler can also handle it, but the code needs some minor adaptations for that.