AntiSquat — An AI-powered solution to prevent typosquatting and phishing

Intro Typosquatting and phishing are quite a headache for businesses. Creating fake websites that look almost identical to legitimate ones has become a piece of cake for attackers. Experts can spot…

5 min readAug 31, 2023

Originally published at https://redhuntlabs.com/blog/antisquat-an-ai-powered-solution-to-prevent-typosquatting-and-phishing/on Tue, 08 Aug 2023 13:16:30 +0000. … Intro

Typosquatting and phishing are quite a headache for businesses. Creating fake websites that look almost identical to legitimate ones has become a piece of cake for attackers. Experts can spot those fakes, but not everyone is an expert or has the time. In the US alone, businesses lose almost $2 billion each year because their clients are getting phished. These attacks are hitting the jackpot mainly because users lack awareness of the danger.

Typosquatting is an unforeseen major deal for mid-sized businesses. Imagine you’ve worked hard to build a brand and a solid online presence. Typosquatters are sneaky troublemakers who register domain names that are similar to yours but with typos or misspellings. They do this to trick customers into visiting their bogus websites instead of yours. This leads to confusion, loss of trust, or they might steal your customers’ data or money!

Dealing with these attacks is a real challenge. They prey on our weaknesses, making it hard to fend them off completely. The most common method to detect phishing websites is the “blacklist” approach. They update a database with blacklisted URLs and IPs (such as Phishtank).

Existing solutions

In our search for an all-in-one solution, we’ve come across several projects. Some (such as Opensquat) simply grab a list of URLs/domain names from an online domain source and look for substrings in them and try to determine if they’re phishing sites or not. Others (such as dnstwist) have tried a permutation or heuristic-based detection method to catch these attacks. While these methods offer promise, their Achilles’ heel lies in the realm of time complexity. As the number of permutations and patterns grows, so does the analysis time, potentially leading to delayed detection and response to real threats. Moreover, the computational resources required for such analysis can strain infrastructure, causing performance issues. Balancing innovation with efficient algorithm design becomes crucial; prioritizing algorithms with better time complexity could hold the key to a more effective and responsive phishing detection system.

Our solution

AntiSquat is our attempt in trying to see the typosquatting problem differently. Our aim is to complement and assist existing methods. It leverages AI techniques such as natural language processing (NLP), large language models (ChatGPT) and more to empower detection.

What sets AntiSquat apart

Large Language Model / ChatGPT integration

AntiSquat takes a fresh perspective on tackling the challenge of typosquatting. It serves as a complementary and enhancing approach to existing methods by harnessing the power of AI techniques like natural language processing (NLP) and large language models (LLMs), ChatGPT in this case. By leveraging these advanced technologies, AntiSquat aims to strengthen the detection process. It utilizes NLP to understand how words are used in language, and LLMs like ChatGPT to generate various domain name variations efficiently. This combined approach boosts the ability to identify and counter the deceptive tactics employed by cybercriminals.

Image processing and optical character recognition

Using Selenium, AntiSquat renders a webpage as a user would see it, then tries extracting all available text in the final version of the page. Some phishers use image assets to prevent phishing detection, so ChatGPT also uses OpenCV and Tesseract OCR to extract words from within images. Words collected from phishing sites are compared to the words collected from the original site, using algorithms such as Levenshtein distancing, which helps the tool generate a similarity index for sites.

Contact information provisioning

AntiSquat integrates with GoDaddy and Whois to try and determine if domains are available for sale. It also tries extracting intelligence such as contact information (emails and phone numbers) from site pages so that organizations can contact domain owners. This is helpful if the domain is available for sale privately. Organizations can use this data to buy out these domains in bulk to protect their users.

How to use

• Clone the project via git clone https://github.com/redhuntlabs/antisquat .

• Install all dependencies by typing pip install -r requirements.txt .

• Get a ChatGPT API key at https://platform.openai.com/account/api-keys

• Create a file named .openai-key and paste your chatGPT API key in there.

• (Optional) Visit https://developer.godaddy.com/keys and grab a GoDaddy API key. Create a file named .godaddy-key and paste your Godaddy API key in there.

• Create a file named ‘domains.txt’. Type in a line-separated list of domains you’d like to scan.

• (Optional) Create a file named blacklist.txt . Type in a line-separated list of domains you’d like to ignore. Regular expressions are supported.

• Run antisquat using python3.8 antisquat.py domains.txt

Example

Let’s say you’d like to run antisquat on amazon.com.

Start by typing amazon.com in domains.txt. Then run python3.8 antisquat.py domains.txt .

AntiSquat generates several permutations of the domain, iterates through them one by one and tries extracting all contact information from the page.

Concluding remarks

AntiSquat is our attempt at taking a crack at the menace that is typosquatting and phishing. It is a long line of tools that are meant to complement the approach, not challenge it. Of course, our solution isn’t fool-proof, but it is a demonstration of yet another way of solving this problem using AI to help empower cybersecurity processes like this by providing practical and statistically inferred ways of looking at problems, as opposed to spending time brute forcing a solution.

How you can help

Since AntiSquat is an open-source project, we appreciate contributions from the community. Feel free to make pull requests with features, raise issues and feature requests on GitHub. You can also contact us about it.

Contributing to BucketLoot is straightforward. Simply follow these steps:

Fork the Antisquat repository on GitHub .

Implement your changes or additions locally.

Create a new branch for your changes.

Commit your changes with descriptive messages.

Push the changes to your fork.

Finally, submit a Pull Request (PR) to the main Antisquat repository.

Our team at RedHunt Labs will review your contributions promptly. Collaborating with the information security community allows us to address a broader range of security challenges and deliver a more powerful and effective tool.

About NVADR

We at RedHunt Labs help organizations discover untracked assets, data exposure, and external attack surface with NVADR, an all-in-one attack surface management SaaS solution.

New attack vectors and vulnerabilities keep originating quite often and might affect one (or many) assets across your organization. During such times, having a precise external asset inventory makes it easy to scan for systems affecting the newly published vulnerability.

NVADR also ‘continuously’ enumerates and lists all the technologies used across your external attack surface and thus helps identify affected assets right away. Don’t hesitate to get in touch with us to schedule your free trial today.

The post AntiSquat — An AI-powered solution to prevent typosquatting and phishing appeared first on RedHunt Labs .

AntiSquat — An AI-powered solution to prevent typosquatting and phishing

Intro Typosquatting and phishing are quite a headache for businesses. Creating fake websites that look almost identical to legitimate ones has become a piece of cake for attackers. Experts can spot…

Written by RedHunt Labs