Keyword matching makes it possible to detect suspicious material based on specific words or phrases
Keyword matching is widely used in everything from search engines to online advertising campaigns to media monitoring tools. In the search for child sexual abuse material, its function is to match words or phrases, in filenames or in text, that have been listed as suspicious and worth investigating.
Keyword matching in its simplest form is lists of words, phrases or groupings that match directly against for example filenames, chatlogs, documents or websites, to identify if they are relevant or not. In addition to exact matching, matches can also be case invariant. This means, for example, that even if capital letters are used, the match will still be made. The algorithm guarantees that the same input data always generates the same output data, and that it cannot be reversed or traced back to the original input data.
Next level is fuzzy matching, which will match even if there are variations, made by mistake or on purpose. This includes simple spelling mistakes, letters being switched around, double letters, the letter A swapped with 4, or E for 3 etc. The match can be further refined by attaching different values to different words, and different words in relation to each other.
Although not classified as keyword matching, further development of textual analysis with AI algorithms is used to analyse larger volumes
of text for semantic summaries, translations, and correction of spelling to name a few examples.
Keyword matching relies on the quality of the keyword lists, how words have been combined and how relationships between words have been scored.
Why use keyword matching?
Files containing child sexual abuse material are often named in specific ways, hence the importance of keyword matches to filenames. They are often combinations of words, scrambled words or very specific terms used by offenders to describe certain types of material.
Lists of known keywords can be used by law enforcement to triage and identify pertinent material, and by platform providers and businesses to highlight suspected files.
Modern web filters, which are used by most businesses, also use keyword matching in a number of ways to look at content and produce a probability score to determine how likely it is that a site contains certain content, and whether it should be blocked or not.
Keyword matching relies on the quality of the keyword lists
Keyword matching is fast and takes up very limited processing power compared to analysis of images. It is also quite easy to get started. Even a limited keyword list will provide value from the start, and the process to refine and build lists to make them better is straight forward.
However, keyword matching is also highly complicated. The quality and value of keyword matching is directly related to the quality of the list that is used. This makes intelligence and deep knowledge of the subject necessary, and that much time is needed to maintain a list in order for it to be effective. As child sexual abuse material is rarely a prioritised area, this means that many lists are lacking.
Also important to note is that a match does not automatically mean that the file contains child sexual abuse material, it is only an indication, yet the file still needs to be reviewed.
About the Technical Model National Response
Inspired by the WeProtect Global Alliance Model we have set out to develop an initiative that looks at technology. We call it the Technical Model National Response. It is an overview of the existing technologies that need to be applied by different sectors and businesses to effectively fight the spread of child sexual abuse material.
Learn about the other
- Show all
- Businesses & Organisations
- Child protection
- Internet Service Providers
- Law enforcement
- NC Report 2016
- NC report 2018
- NC Report 2018 Links
- NC Report 2019
- NC Report 2020
- NetClean Labs
- News items
- Press releases
- Reports and research
- Social Media Platforms & Search Engines
- Svensk press
- Technologies from NC report 2019
- The Technical Model National Response