blog, Uncategorized, Website Design

Spamassassin Rules and Regex

Spamassassin Frustrations

cPanel spam rules

It never ceases to amaze me how good the spammers are and how bad corporations are at email.  I’ve been battling spam for years for a good sized company.  Spamassassin was installed on the webserver by default so I didn’t feel like changing the software.  Over the years I had used the cPanel interface to create some rules.  It didn’t take me much time to find out that the rules written in this “Global Filters” area were pretty general and it was blocking mail that we wanted.  I had also gone into the cPanel and whitelisted some domains that the company gets regular email from.  This helped – but it was hard to figure out what all domains to put on this list.  Then I started using the blacklist feature.  This isn’t really effective because spammers change the domain name almost every day it seems.

Google To The Rescue?

Over time the spam just kept getting worse and worse.  So I decided to look up how to modify the program to stop more spam.  I have to note here that I’ve been doing websites for over 20 years, working on servers for as long and did a several year stint at an ISP – so I not a rookie when it comes to this stuff.  That being noted, I tried a bunch of different techniques I found on the Spamassassin wiki and the multitude of forum posts that were found by searching Google.  The majority of the spam was still getting through and the webserver started having load issues.

Constant frustration would cause me to quit for a few months and then try again. Going back to Google and doing a search that didn’t reveal much more information.  One bit of information I figured was useful was that the regular expressions (regex) used in spamassassin is based on the Perl standards.  Nice!  I had written plenty of programs (years and years ago) in Perl and figured I’d give that a go.  I tried using the cPanel interface again (Global Filters => Create/Edit Filter => Rule “matches regex”) a few simple rules work well.  Eureka!  But soon I found that some of these weren’t working either.  Using the testing tool it appeared that spamassassin was putting in the forward slashes at the beginning and end of the regex so some of the parameters weren’t working correctly.

So What Worked

I could go on and on – but enough of the background.  After all you probably Googled how to write custom rules using regex in spamassasin and have your own story to tell.

First Step:

The important thing is there are a few folders where the spamassassin rules can be entered: /usr/share/spamassassin, /etc/mail/spamassassin, or home/account/user_prefs.  If you are not running your own server but have an account on a Linux server running spammassassin your file path is probably: /.spamassassin.  If you are not logging in via telnet the folder may be hidden and you’ll have to un-hide it.

Within this folder there will be a local.cf file and there may be a user_prefs file.  I’ve found it to be safest to not edit the local.cf file but just the user_prefs file.  If there is no user_prefs file don’t work you can create one.  Just make sure when you create the file you are using a real text editor (like TextPad) and not a word processing program.

Open up the local.cf file (I’m not a fan of telnet so I downloaded it and edit on my computer) and make sure the rewrite_header Subject is un-commented and add a rewrite.  I used **SPAM(_SCORE_)** because I wanted to see the spam score for testing and editing my rules without having to look at the source of the email headers.

Then make sure that the allow_user_rules is un-commented.  At this point I also setup a dummy email account and added the lines:
# allow all spam to pass to email address
all_spam_to whateveriwant@mydomain.com

At this point I saved the file and uploaded it to the server.

Second step:

Now to open or create the user_prefs file.  This is where I put my custom rules and well as my whitelist and blacklist.

One of the things I noticed was that every email from the TLD .eu was spam, and I mean every.  I analyzed tens of thousands of email headers and couldn’t find one legitimate email.  This TLD was sending thousands of emails a day to the company server.  So this seemed like the logical place to start.  Here is the regular expression that I’ve found to work:

header DJD_DOM_EU From =~ /\b@.{1,25}\.eu\b/i
describe DJD_COM_EU Give high score to tld eu
score DJD_DOM_EU 5.0

I’m not going to go into a lot of detail at this point because you can learn regex somewhere else.  But here are a few more examples:

header DJD_SUBJECT_HARP Subject =~ /^HARP/
describe DJD_SUBJECT_HARP Subject: starts with HARP
score DJD_SUBJECT_HARP 7.0

header DJD_SUBJECT_WINDOWS Subject =~ /\breplacement windows\b/i
describe DJD_SUBJECT_WINDOWS Subject: contains replacement windows
score DJD_SUBJECT_WINDOWS 3.0

I hope this helps someone so they/you don’t have to spend hours trying to figure out simple spamassassin rules.