Binary hashing is used to fingerprint and discover online child sexual abuse material on a content level, i.e. it identifies the actual images and videos depicting child sexual abuse. Hashing technology is a secure, fast and reliable technology that is used in various ways, e.g. in detection tools, digital investigation tools and crawlers. It is used by law enforcement, NGOs, businesses and platform providers.
When new child sexual abuse material is found, the image or video is classified as illegal and given a hash value, a unique digital fingerprint. These fingerprints are added to databases that are used in different kinds of software used to identify child sexual abuse material.
A binary hash is created by a mathematical algorithm that transforms the data of a file, whatever size it may be, into much shorter fixed-length data, a hash value. The hash value acts as the file’s fingerprint, allowing software to find and identify it.
The conversion is arbitrary, however the algorithm always transforms the same input data into the same output data. The output data cannot be reversed or traced back to the original input data. This secure feature means that an image cannot be recreated from a hash value.
Hash values are produced by law enforcement agencies and select NGOs that work to combat child sexual abuse. When unknown material is found, a hash value is calculated and added to a database. Specialised software can then run matches against these databases and look for exact copies of the material on for example social media sites, and in IT environments that are protected by detection software.
Law enforcement agencies use hashes to find pertinent material in investigations and for evidence authentication, while some NGOs and social media companies send out web crawlers to actively search out known material and take actions to remove it. Businesses and organisations use detection software to safeguard their IT environment, to comply with policies and ethical values and work towards the Sustainable Development Goals. Detection software is also used to protect employees, especially IT professionals, from the risk of being exposed to child sexual abuse material.
Strengths and limitations
Binary hashing is efficient and reliable. Binary hashes are non-reversible and as they will only detect classified material and identical files, the risk that technologies using binary hashing will flag the wrong material is extremely low.
The technology also works very fast. Typically, binary hashing matches are almost instant, and detection takes up limited processing power. This is crucial for businesses and organisations with IT environments where speed and processing power are of the outmost importance.
The limitation of binary hashing technology is the same as its strengths: That it can only detect already known and indexed material. Although this limits the scope for detection and removal of images, it guarantees accuracy as only material that has been classified by law enforcement professionals, and nothing else, is detected.
Binary hashing is one of many technologies that can be applied by businesses to stop child sexual abuse material. In the last section of the NetClean Report 2019 we presented an overview of technologies and methods available to businesses to stop child sexual abuse material. The articles were a revision and abridgement of longer and more technically detailed articles, published here. In a series of blog posts we will compare the different technologies and show how they complement each other.