Technologies to stop CSAM: Binary Hashing.

In this short series of articles, we look at some of the technologies that are used to stop child sexual abuse material today. Some are used in NetClean’s products, and some are used by law enforcement and NGOs to find and remove material online.

Here we look at the first of two hashing technologies, binary hashing. Read on to see our articles on robust hashing; AI and keyword matching in a series of short articles that we will post over the summer.

Binary hashing is used to fingerprint and discover child sexual abuse material on content level. It identifies actual images and videos depicting this abuse. Hashing technology is secure, fast and reliable and is used in various ways. e.g. in detection tools, digital investigation tools and crawlers.

If and when new child sexual abuse material is found, the image or video will be classified and given a hash value, a unique digital signature. These signatures can be added to databases that are used in software to match child sexual abuse material. One example of this software is NetClean ProActive, which is an efficient CSAM tool.

This work is done by law enforcement agencies and select NGOs that work to combat child sexual abuse. When unknown material is found, a hash value is calculated and added to a database. Specialized software can then run matches and look for exact copies of the material on for example social media sites, and in IT environments that are protected by detection software.

How does it work? A binary hash is created by a mathematical algorithm that transforms the data of a file, whatever size it may be, into much shorter fixed-length data, a hash value. This acts as the file’s signature, allowing software to find and identify it.

The conversion is random, however, the algorithm always transforms the same input data into the same output data. The output data cannot be reversed or traced back to the original input data. The secure feature means that an image cannot be recreated from a hash value.

This is an efficient and reliable technology. As binary hashes are non-reversible and only detect classified material and identical files, it is extremely unlikely that the wrong material will be flagged. This is why law enforcement agencies use this technology to find material in investigations and for evidence authentication.

It requires less data power than robust hashing (see next week’s article), and that is why NGOs and social media companies incorporate this technology into their web crawlers in their active search for known material online.

Binary hashing is a powerful tool, however, with the slightest alteration of an image, the hash value will change and a crawler or technology that relies solely on binary hashing will not be able to find or recognize the image or video. Robust hashing, which we will look at next week, offers a solution to this problem.