How web crawlers can help find child sexual abuse material

Web crawlers

Web crawlers, or crawlers, Robots, Search Bots or just Bots, as they are also known, are automated software that search engines and other bodies use to, for example, find and index what’s new on the Internet. There are many different types of web crawlers, however in general they all follow the same pattern of work.

They crawl over websites, download the content and push it into a database which indexes content, and finally visits all the hyperlinks that exists on the webpage to find new material to index. Traditional web crawlers are programmed to index written content.

Although they might be able to read images they are not programmed to recognise illicit photographed or filmed material, such as online child sexual abuse. However a web crawler built to look for specific fingerprints, or hash values, can be a useful tool when looking for online child sexual abuse material.

THE ARACHNID CRAWLER

The Canadian Centre for Child Protection (C3P) through its operation of the hotline Cybertip.ca have built a web crawler called Project Arachnid, which has the specific task of finding and removing online child sexual abuse material. It operates by using Microsoft’s Photo DNA technology along with hashes (digital fingerprints) from lists generated by several organisations, the biggest being NCMEC, The Royal Canadian Mounted Police (RCMP) and Interpol.

Learn More

THE IWF WEB CRAWLER

The IWF’s web crawler was developed around the same time that Project Arachnid developed theirs. The crawlers are built similarly, and both aim to crawl websites to push content into databases for verification and indexing. They both also follow links on pages to see whether they can find more child sexual abuse material on other pages. Does the world need more than one crawler? Yes. It provides resiliency. Should there be an issue with any one crawler, others will pick up the work.

Learn More

The major gain – limiting revictimisation

The aim of both web crawlers is to protect victims from revictimisation through images being distributed across the internet.

The fact that images are actively pursued and removed offers the victims of this crime relief. Knowing that there is specific technology, organisations and NGOs working to remove material that can otherwise be shared again and again, helps alleviate the feeling that the cycle of abuse is endless.

Technology heavily dependent on human resources

Web crawlers are efficient at finding online child sexual abuse material that is already known. They make a huge difference in tackling the spread of child sexual abuse material. The challenge that NGOs face processing the information identified by crawlers is the ever-growing need for human resources to ensure that the material is viewed and categorized.

About the Technical Model National Response

Inspired by the WeProtect Global Alliance Model we have set out to develop an initiative that looks at technology. We call it the Technical Model National Response.  It is presenting the existing technologies that need to be applied by different sectors and businesses to effectively fight the spread of child sexual abuse material.

Learn about the other
technologies

  • Aug152018
    Blocking - Technical Model National Response

    Blocking Technologies
    Read now

    NetClean.com
  • Aug152018

    Filter Technologies
    Read Now

    NetClean.com
  • Aug152018

    Artificial Intelligence
    Read now

    NetClean.com
  • Aug152018

    Hashing Technologies
    Read Now

    NetClean.com
  • Aug152018

    Web Crawlers
    Read Now

    NetClean.com
  • Aug152018

    Notice and Takedown
    Coming soon

    NetClean.com
  • Aug142018

    Law Enforcement Collaboration platform – Coming soon

    NetClean.com