Keyword Matching

Keyword matching

Keyword matching makes it possible to detect suspicious material based on specific words or phrases

Keyword matching is widely used in everything from search engines to online advertising campaigns to media monitoring tools. In the search for child sexual abuse material, its function is to match words or phrases, in filenames or in text, that have been listed as suspicious and worth investigating.

Keyword Matching

Keyword matching in its simplest form is lists of words, phrases or groupings that match directly against for example filenames, chatlogs, documents or websites, to identify if they are relevant or not. In addition to exact matching, matches can also be case invariant. This means, for example, that even if capital letters are used, the match will still be made. The algorithm guarantees that the same input data always generates the same output data, and that it cannot be reversed or traced back to the original input data. 

Fuzzy matching

Next level is fuzzy matching, which will match even if there are variations, made by mistake or on purpose. This includes simple spelling mistakes, letters being switched around, double letters, the letter A swapped with 4, or E for 3 etc. The match can be further refined by attaching different values to different words, and different words in relation to each other.

Textual analysis

Although not classified as keyword matching, further development of textual analysis with AI algorithms is used to analyse larger volumes
of text for semantic summaries, translations, and correction of spelling to name a few examples.

Keyword matching relies on the quality of the keyword lists, how words have been combined and how relationships between words have been scored.

Why use keyword matching?

Files containing child sexual abuse material are often named in specific ways, hence the importance of keyword matches to filenames. They are often combinations of words, scrambled words or very specific terms used by offenders to describe certain types of material.

Lists of known keywords can be used by law enforcement to triage and identify pertinent material, and by platform providers and businesses to highlight suspected files.

Modern web filters, which are used by most businesses, also use keyword matching in a number of ways to look at content and produce a probability score to determine how likely it is that a site contains certain content, and whether it should be blocked or not.

Keyword matching relies on the quality of the keyword lists

Keyword matching is fast and takes up very limited processing power compared to analysis of images. It is also quite easy to get started. Even a limited keyword list will provide value from the start, and the process to refine and build lists to make them better is straight forward.

However, keyword matching is also highly complicated. The quality and value of keyword matching is directly related to the quality of the list that is used. This makes intelligence and deep knowledge of the subject necessary, and that much time is needed to maintain a list in order for it to be effective. As child sexual abuse material is rarely a prioritised area, this means that many lists are lacking.

Also important to note is that a match does not automatically mean that the file contains child sexual abuse material, it is only an indication, yet the file still needs to be reviewed.

About the Technical Model National Response

Inspired by the WeProtect Global Alliance Model we have set out to develop an initiative that looks at technology. We call it the Technical Model National Response.  It is an overview of the existing technologies that need to be applied by different sectors and businesses to effectively fight the spread of child sexual abuse material.

Learn about the other
technologies

  • Aug202018

    Hashing Technologies
    Read now

  • Aug192018

    PhotoDNA
    Read now

  • Aug182018

    Artificial Intelligence
    Read now

  • Aug162018
    Blocking - Technical Model National Response

    Blocking Technologies
    Read now

  • Aug162018

    Web Crawlers
    Read now

  • Aug152018

    Filter Technologies
    Read now

  • Aug142018

    Keyword Matching
    Read now

  • Aug142018

    Law Enforcement Collaboration platform – Coming soon

  • Aug132018

    Notice and Takedown
    Coming soon