You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Philip Prindeville <ph...@redfish-solutions.com> on 2011/11/08 19:32:11 UTC
Re: Chickenpoxed subjects
On 10/20/11 8:24 PM, Adam Katz wrote:
> On 10/19/2011 04:43 AM, Mynabbler wrote:
>> You are kidding, right? 50% of this crap comes from FREEMAIL
>> addresses, and even more specific: 44% of this crap is delivered by
>> aol.com. The aol deliveries have about 85% unique from@aol
>> addresses, so they pretty much 'own' aol.
>
> We're writing spam filters, not idiot filters. The fact that there is
> so much overlap is often useful, bit the overlap is not complete. There
> is also a decent amount of overlap between the
> mostly-computer-illiterate and freemail users. I think this drives your
> current line of thinking.
>
> There are a lot of people that do very spammy things. It is a testament
> to SA and other filters that such non-spam doesn't so commonly flag as spam.
>
Sorry to come to the party late on this, was traveling a bit.
It seems to me that if you have lines like:
Subject: T R +A N/N!l :ES, P \0 R N
Subject: S C/H ,O 0=LG)l :R$L$S ) P -0 RN
Then the solution is to use agrep. Make deletions of punctuation very low cost, as well as the usual transformations like:
0 => O
1 => l
$ => S
...
also be low-cost. (Of course, then you end up with the possibility of clash between deleting $ and replacing it with 'S', but agrep is good about checking both)... they you just grep through a dictionary of the "usual offenders":
lesbian
cash
meds
porn
...
I'm not familiar with perl-String-Approx... reading up on it, it uses the Levenshtein distances just like agrep does... so it would be ideal for doing approximate matches.
http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm
-Philip