It’s time to do something about foreign spam

Currently the only spam that makes it into my inbox (aside from the odd 419 here and there) is foreign spam.

It seems like these spams don’t end up in SURBL/URIBL or in SpamHaus SBL, possibly because of low visibility by those places.

I imagine that creating some sort of general rule for these is going to be pretty hard. They all tend to be in fairly normal character sets (ISO-8859-1 or one of the Windows-* types) and so I’m going to have to do some level of language analysis.

One thing that seems fairly consistent in Spanish/Portugese spams is the use of “e” as a word on its own. However given the number of geek mailing lists I’m on that might come up as a variable name (or the mathematic constant), so that alone won’t be enough.

Another option is some sort of heuristic language detection like “TextCat”, but that won’t work terribly well for these as a lot of them are mostly images.

Any suggestions here would be most welcome.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s