Like binary hashing technology, robust hashing is used to fingerprint and discover online child sexual abuse material on a content level. In contrast to the binary system, robust hashing technology looks at the actual visual content of an image rather than just the binary data of the image file. A widely used hashing technology is PhotoDNA. It was developed specifically to detect child sexual abuse material, and is used today by law enforcement, NGOs, businesses and platform providers.
Robust hashing ensures that the input data produces a hash value that will match any image with the same visual content. Like binary hashes, the PhotoDNA hash value cannot be reversed into an image.
Whereas two copies of the same image in different file formats will produce completely different binary hashes, robust hashing technology can detect the image even if a slight alteration, such as resizing or change of file format, has been made. This is because the recognition is based on the visual content of the image, rather than the binary file data.
As with binary hashing, PhotoDNA classification is made by law enforcement and a number of NGOs. The hashes are added to databases, which can be used to match and detect known child sexual abuse material.
Law enforcement use PhotoDNA hashes in the same way that they use binary hashes, and web crawlers use both binary hashes and PhotoDNA hashes when trawling the net.
Unlike binary hashes, robust hashes are more frequently used in environments where detection in real-time is not of critical importance. Social media platforms are one example of this. Another is that businesses can deploy a secondary and wider robust search in their IT environment after binary hashing technology has made a match. The robust search then scans nearby lying files to search for nearly identical material.
Strengths and limitations
As with binary hashing, robust hashing technology detects only already known and indexed material. However, the robust technology is also able to detect images that have been slightly altered, which widens the search.
The choice of technology always depends on the context and purpose of the search taking place. The reason why robust hashing is not always used instead of binary hashing is that it, although it is very fast, is slower than binary hash matching. Instead of an instant match, the complete image has to be analysed for a PhotoDNA match, which takes more processing power. Therefore, depending on the search, one or the other, or the technologies combined, might be most effective.
Robust hashing is one of many technologies that can be applied by businesses to stop child sexual abuse material. In the last section of the NetClean Report 2019 we presented an overview of technologies and methods available to businesses to stop child sexual abuse material. The articles were a revision and abridgement of longer and more technically detailed articles, published here. In a series of blog posts we will compare the different technologies and show how they complement each other. The first technology presented was Binary Hashing and you can read more about it here.