Hash values – Fingerprinting child sexual abuse material

Hashing technologies are used in many different ways to stop child sexual abuse material

Hashing technologies are used in a number of ways by law enforcement, social media platforms, NGO’s and businesses, and in combination with other technologies, to detect child sexual abuse material in the workplace, on social media platforms or on hosting sites. Hashing technologies are efficient and reliable.

The limitations of hashing technologies are also their strengths. As hashing technologies can only detect images that have been identified and classified as child sexual abuse material, they are unable to detect new or previously unseen material. However, simultaneously this ensures that they only detect material that has been classified by law enforcement professionals and nothing else.

Hash Values

A binary hash is created by a mathematical algorithm that transforms data of any size into much shorter fixed-length data. This shorter sequence represents the original data and becomes this file’s signature, or its hash value – often called its digital fingerprint.

The algorithm guarantees that the same input data always generates the same output data, and that it cannot be reversed or traced back to the original input data. 

A feature of binary hashes is thus that the original data, e.g. an image, cannot be recreated from the hash value. Most hash functions are arbitrary in that they do not generate similar output data for files that are similar in content or likeness in image. There is no connection between what the files look like or contain and the hash value that the algorithm produces. The data is also so precise that if the file is in anyway altered the hash will change entirely. 

Why use hash values?

Hashes are not reversible, therefore they are used for things like verification of data, encryption of passwords and other sensitive data. There are several types of hash functions that are used worldwide to verify files. Two well-known cryptographical hash functions are MD5 and SHA.

Their algorithms can be used for verification of data, i.e. if a hash accompanying a document is exactly the same as it was when the document was sent, then the file has not been altered on the way. This means that it can also be used for matching purposes, to identify identical files.

Hashes can also be used for encryption purposes. In the case of passwords, what is typed into the password field is turned into a hash before it is compared against what is held in the database of hashed passwords.

Therefore if a database of passwords is hacked, the hacker will find encrypted data, and not the passwords in their original form. This means  that things like passwords can be stored with relative safety, and documents can be secured against manipulation.

Hash values and child sexual
abuse material


In digital evidence forensics, cryptographical hash algorithms are used for file identification and evidence authentication. By creating databases of hashed child sexual abuse material, new material can quickly be matched against already known files…

Learn More


NetClean ProActive works similar to an anti-virus program. But instead of detecting a virus, ProActive detects images and videos that law enforcement agencies have classified as child sexual abuse material…

Learn More


The IWF maintains a hash list that is used in several different ways. For perceptional hashing the IWF uses PhotoDNA hashes. The benefit of the PhotoDNA hash is that it will recognise images even if they are slightly altered. For other use cases, faster and more exact hashing algorithms are used…

Learn More

Hash collisions and broken hashes – A challenge

As hash functions have infinite input length and a predefined output length, there is inevitably going to be a possibility that two different inputs produce the same output hash. This is called a hash collision, but depending on which hash function is used, the likelihood of this happening is extremely low. For modern cryptographical hash functions it is highly unlikely that a hash collision will occur and it is almost impossible to manually create a collision. Another issue that can make hashing problematic is a broken algorithm.

If the algorithm has been broken, you can, if you know the hash value, create the identical hash with different input data. To put it in a simpler way: Let’s say you know the hash value for a particular password, but don’t know the original password. If the hash algorithm is broken, it is possible to use different indata to create the same hash value and access the password protected account. This makes some algorithms unsuitable for encryption purposes, however, they can still be used for verification when transmitting files.

Different kinds of matching

There are also other technologies and algorithms that match and identify images that are not based on binary hashing technique. One example of such an algorithm is PhotoDNA, that matches images based on the visual information in the images. This means that PhotoDNA can find the same image even if it has been saved in a different file format. This contrasts from binary hashes that only recognise identical files based on the binary information.  

Combining hash values and PhotoDNA

Both technologies work well on their own, however combining the two increases the probability of finding material. One example of how they can be combined is NetClean ProActive, which is software deployed on work computers to detect child sexual abuse material. It uses its hash database first, and if the software finds a match on a computer, it can also start a PhotoDNA search. In practice this means that if an image is detected, a PhotoDNA search is deployed to find visually identical images in addition to the binary identical images. Another example of this technology in action can be found in our article on web crawlers.

About the Technical Model National Response

Inspired by the WeProtect Global Alliance Model we have set out to develop an initiative that looks at technology. We call it the Technical Model National Response.  It is an overview of the existing technologies that need to be applied by different sectors and businesses to effectively fight the spread of child sexual abuse material.

Learn about the other

  • Aug202018

    Hashing Technologies
    Read now

  • Aug192018

    Read now

  • Aug182018

    Artificial Intelligence
    Read now

  • Aug162018
    Blocking - Technical Model National Response

    Blocking Technologies
    Read now

  • Aug162018

    Web Crawlers
    Read now

  • Aug152018

    Filter Technologies
    Read now

  • Aug142018

    Keyword Matching
    Read now

  • Aug142018

    Law Enforcement Collaboration platform – Coming soon

  • Aug132018

    Notice and Takedown
    Coming soon