Fighting spam has gained a linguistic dimension

Even the smartest bots that send spam and viruses can be detected. They are betrayed by the dialect that they use to communicate with the mail server. Researchers at NASK are working on a database of such dialects.

“We are looking for remote employees…”, “Get a rebate…”, “Improve your sex life…” – most of us are already sick of spam. Unwanted emails that clogs up mailboxes and increases the risk of missing an important message. Spam is not only annoying, but sometimes it is also dangerous, for example when you click on a non-trusted link and dangerous software gets installed on your computer without your knowledge.

That is why specialists continue to look for now ways to identify serial mailings. One of such methods is the analysis of the dialect, i.e. minor differences in the use of the SMTP communication protocol. NASK representatives informed about the research on this topic in a release sent to PAP.

“SMTP protocol that email programs use to communicate with servers allows some coding freedom. The effect is that individual mail clients differ from each other, and differences in their communication with servers can be observed and used to identify the source of correspondence” – explained Piotr Bazydło from the Network Security Methods Team at NASK.

SMTP protocol allows the author of the mail program for some flexibility when writing the code. Additional spaces, uppercase or lowercase letters, domain name instead of the sender’s IP address and other differences do not pose a problem for the protocol. As long as the code remains within the general framework, it will be executed and communication will take place correctly. However, when analysing this communication using software installed on the server, these differences can be identified and treated as fingerprints.

Commonly used e-mail programs have different dialects. By monitoring client’s communication with the server it is possible to determine if the e-mail has been sent from Outlook, Thunderbird, Gmail server or other. When the analysis shows that the code does not match any of the known programs or is a combination of more than one of them, it indicates that the mail was sent using a less official tool, possibly a bot.

“The first to notice the differences between the dialects were the authors of a paper published in the US in 2012. We have decided to continue research on the use of dialect differences in the analysis of spam and botnets that send it. Our tool is largely forensic, it can provide data for the needs of the police and law enforcement” – emphasised Piotr Bazydło.

However, the tool can also be used to create a “black list” of addresses from which dangerous correspondence is being sent, which can help update anti-spam filters.

Using the differences in the exchange of commands between the server and the client and other data that accompany the message, researchers can not only detect mail from a suspect source, but also more precisely identify that source, for example, determine the type of botnet that sends the spam messages.

The scientists collect data for their research using a spamtrap – a computer that pretends to be a mail server. Spamtraps help catch bots that send mail to random addresses.

These and other traps are included in the cyber threat early warning system developed in the international research program SISSDEN (Secure Information Sharing Sensor Delivery Event Network). NASK is the leader of this program.

In the consortium there are also seven other institutions from various European Union countries. SISSDEN is financed under the European Union’s Horizon 2020 programme. Ultimately, a network will consist of at least 100 servers (at least one in each member state) used to detect and analyse dangerous online activities.

This article was originally published at the website PAP. Science in Poland

Comments

comments

Fighting spam has gained a linguistic dimension

Even the smartest bots that send spam and viruses can be detected. They are betrayed by the dialect that they use to communicate with the mail server. Researchers at NASK are working on a database of such dialects.

most popular

ecosystem Why Fintech in Isreal is booming

ecosystem NE100: Game changers from Central Europe

business Golem: the decentralised supercomputer

Comments