You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2012/12/02 19:47:07 UTC
Re: Trouble with bayes poisoning spam

Hi,

> Actually, that's a Snowshoe IP.
> Which, on balance, can be a good thing, slaying-wise. :)

You mean that it's more likely to be added to the SBL with the other
IPs in the same range sooner?

> Almost four years ago, I posted my approach to snowshoe slaying:
>         http://mail-archives.apache.org/mod_mbox/spamassassin-users/200902.mbox/%3c20090204.00000150@iowahoneypot.com%3e
>
> It has continued to evolve since then.
> Both IP block tracking and "identity" (Subject & From.Realname)
> header token checking are still the two most useful approaches.

I read your email from four years ago. How has it evolved?

We have created a few scripts that allow you to paste a phrase from a
FN into a text file, which is then generated into a rule. So "Olde
Brooklyn Lantern" in the body would get a score, etc.

Combined wit ZEN and/or SBL, and I think this is similar to what
you're doing, correct?

> I see you have hits on RELAYCOUNTRY.  If you maintain your own
> virtual snowshoe nations, and merge them into your "real" nations,
> while building a list of snowshoe tokens, you'll have very good
> success catching these.

At one point I hoped I could exclude certain countries, or score some
higher than others, but too much legitimate mail is received from all
over the world. Got burned too many times.

> For example, that IP is in "root eSolutions" space, and they have
> had a snowshoe problem for at least a year and a half.
>
> Here's their ranges that I have in my small scale database:
>         94.242.192.0 - 94.242.255.255
>         188.42.0.0 - 188.42.127.255
>         212.117.160.0 - 212.117.191.255

Do you list them all as class C's or is there a CIDR mask that matches
these? I've found many class C's in 41/8 that I'd really like to know
what valid companies use this whole class A, or better isolate the
class C's to block them.

> About two years ago, I hit a tipping point with my snowshoe IP
> data, and can now _VERY_ rapidly identify new blocks.

I would really be interested in that, especially if it's beyond what
is already available in the SBL.

> Both of these phrases are in my snowshoe tokens database:
>         Classic Lantern
>         Incredible Light

How do these phrases relate to a snowshoe IP range? And one that isn't
already part of the SBL?

You would have to at least catch that phrase on two IPs in the same
class C before you could consider it a snowshoe, correct?

> I checked, and one of my best data feeds was hit by the same
> IP block in your sample.  Here are quick dumps of the contents of
> the identity headers:
>
> frequency and contents of Field [Subject], filtered by [<all> & IP w/"188.42.11."]
>         A unique christmas gift for the kids
>         A variety of medigap options explained and simplified

Perhaps not to the same degree as you do, but I also have these
phrases in my local database from which rules are created. Do you have
a mechanism to auto-generate them? Shouldn't this be incorporated into
Justin's SOUGHT rules?

> As soon as I've finished a couple of timesink projects, I'll start
> on those.
>         - "Chip"

Thanks,
Alex