How spammers fool Bayesian filters and how to stop them

Effectively stopping spam in the long term requires much more than blocking individual IP addresses and creating rules based on keywords that spammers often use. The sophistication of spam tools, coupled with the growing number of spammers in nature, has created a hyper-evolution in the variety and volume of spam. The old ways of blocking the bad guys don’t work anymore.

Examining spam and spam blocking technology can illuminate how this evolution is taking place and what can be done to combat spam and retrieve email as the efficient and effective communication tool it was intended to be.

One method used to combat spam is Bayesian filtering. Bayesian logic, named after Thomas Bayes, an English mathematician, is used in decision making and inferential statistics. Bayesian archivers maintain a database of known spam and junk mail, or legitimate email. Once the database is large enough, the system sorts the words according to how likely they are to appear in a spam message.

Words that are most likely to appear in spam receive a high score (between 51 and 100), and words that can appear in legitimate emails receive a low score (between 1 and 50). For example, the words “free” and “sex” generally have values between 95 and 98, while the words “emphasis” or “disadvantage” can have a score between 1 and 4. Commonly used words such as “the” and ” that “and new words for Bayesian filters receive a neutral score between 40 and 50 and would not be used in the system algorithm.

When the system receives an email, it splits the message into tokens or words with assigned values. The system uses the tokens with scores at the upper and lower end of the range and develops a score for the email as a whole. If the email has more spam tokens than ham tokens, the email will have a high spam score. The email administrator determines a threshold score that the system uses to allow email to pass through to users.

Bayesian filters are effective in filtering spam and minimizing false positives. Because they adapt and learn based on user feedback, Bayesian Filers produce better results as they are used within an organization over time. However, they are not infallible. Spammers have learned which words Bayesian filters consider spam and have developed ways to insert non-spam words into emails to lower the overall spam score of the message. By adding paragraphs of text from novels or news, spammers can dilute the effects of high-ranking words. The insertion of text has also caused normally legitimate words found in novels or news to have an inflated spam score. This can make Bayesian filters less effective over time.

Another approach spammers use to fool Bayesian filters is to create spam-less emails. For example, a spammer can send an email that only contains the phrase “Here is the link …”. This approach can neutralize the spam score and entice users to click on a link to a website that contains the spammer’s message. To block this type of spam, the filter would have to be designed to follow the link and scan the content of the website that users must visit. Currently, Bayesian filters do not employ this type of filtering because it would be prohibitively expensive in terms of server resources and could potentially be used as a method of launching denial of service attacks against commercial servers.

As with all single-method spam filtering methodologies, Bayesian filters are effective against certain techniques that spammers use to fool spam filters, but they are not a magic bullet to solve the spam problem. Bayesian filters are most effective when combined with other spam detection methods.

The solution

When used individually, each technical antispam has been consistently outperformed by spammers. Grandiose plans have been put forward to rid the world of spam, such as charging a penny for every email received or forcing servers to solve math problems before delivering the email, with little success. These schemes are not realistic and would require a large percentage of the population to adopt the same anti-spam method to be effective. You can learn more about fighting spam by visiting our website at http://www.ciphertrust.com and downloading our white papers.

Leave a Reply Cancel reply

admin

Related Posts