Photo by Joseph Barrientos on Unsplash

Mercator

Open Source Domain Data Crawler

  • Date: 2022-04-01
  • License: apache2
Java
Spring
Terraform
Kubernetes
AWS

Mercator is an open source web crawler, made to crawl the .be, .brussels and .vlaanderen zones monthly. It collects DNS records, web technologies, used TLS ciphers, SMTP parameters, VAT numbers and way more.

The crawler is built upon the concept of SQS, namely queues. The different modules can all be scaled individually, allowing a fast and complete crawl of the .be zone within a day. Its deployment target is Elastic Kubernetes Service (EKS) on AWS.

During my time at DNS Belgium, I helped to finish this project, including many maintenance and monitoring tasks. I also worked on closed-source additions to this codebase.