You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2006/07/12 19:30:00 UTC
The best way to use Spamassassin is to not use Spamassassin
Catchy subject line eh?
OK - so what I mean by this is that I now use SA for about 5% of all
incoming email. The reaso of spam is rejected before I get to SA through
a fairly large number of tricks that allow me to determine with near
100% accuracy things that are spam. It is none mostly through behavior
and karma related lists. Being host blacklisted or URI blacklisted.
Similarly, I have created a whitelisting system that tracks hosts and
other aspects of the message that can determine with near 100% accuracy
messages that are not spam so that I can bypass SA and fast track them
through the system. So that leaves only about 5% that I actually have to
content test.
Of course that 5% is very important because that is where I get the data
for the other tests that allow me to bypass filtering. But - I want you
all to start thinking of a new way to look at spam filtering. I have
some concepts that I'm testing that seem to be working well and if
widely distributed could revolutionize the concepts behind processing
email. And SA is still an important part of that.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Marc Perkel <ma...@perkel.com>.
Rob Poe wrote:
>> Of course that 5% is very important because that is where I get the
>>
> data
>
>> for the other tests that allow me to bypass filtering. But - I want
>>
> you
>
>> all to start thinking of a new way to look at spam filtering. I have
>> some concepts that I'm testing that seem to be working well and if
>> widely distributed could revolutionize the concepts behind processing
>>
>
>
>> email. And SA is still an important part of that.
>>
>
> Catchy, indeed. So any enlightenment here?
>
>
I'm building a dns based list system that's not just a blacklist but also a whitelist and that I call a yellow list. It's based on server IP and the idea is to use the white lists to get rid of false positives from blacklists.
The idea being that many spam filtering services report the IP addresses of servers sending them spam and ham. These are totalled and some will be 99%+ spam, 99%+ ham or a mix. The spam servers are blacklisted, the nonspam servers are whitelisted and the one's in the middle are yellow listed. Yellow means that you never get blacklisted making the false positives of blacklists go way down.
Re: The best way to use Spamassassin is to not use
Spamassassin
Posted by Rob Poe <rp...@plattesheriff.org>.
>Of course that 5% is very important because that is where I get the
data
>for the other tests that allow me to bypass filtering. But - I want
you
>all to start thinking of a new way to look at spam filtering. I have
>some concepts that I'm testing that seem to be working well and if
>widely distributed could revolutionize the concepts behind processing
>email. And SA is still an important part of that.
Catchy, indeed. So any enlightenment here?
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Chris Lear <ch...@laculine.com>.
* Marc Perkel wrote (12/07/06 18:30):
> Catchy subject line eh?
>
> OK - so what I mean by this is that I now use SA for about 5% of all
> incoming email. The reaso of spam is rejected before I get to SA through
> a fairly large number of tricks that allow me to determine with near
> 100% accuracy things that are spam. It is none mostly through behavior
> and karma related lists. Being host blacklisted or URI blacklisted.
I don't know if it's relevant to Marc's point, but it seems to me that
if SA was reduced to network checks only it would still be a very good
blocker of spam. And perhaps what Marc is doing is, more or less, moving
SA's network checks into the MTA and using them to reject rather than
just score.
I suppose something similar would be to score all the URIBL rules and
RCVD_IN rules high, and abandon the traditional regex rules.
Network checks are easily the most hit spam rules in SA anyway. Here's a
bit of sa-stats for spam on a machine I look after (the MTA blocks based
on sbl-xbl.spamhaus.org before anything gets to SA, so that's not
represented here):
1 BAYES_99
2 URIBL_BLACK
3 URIBL_SBL
4 URIBL_JP_SURBL
5 URIBL_OB_SURBL
6 RCVD_IN_SORBS_DUL
7 RCVD_IN_NJABL_DUL
8 HTML_MESSAGE
9 FORGED_RCVD_HELO
10 URIBL_SC_SURBL
11 URIBL_WS_SURBL
12 SARE_MLB_Stock6
13 URIBL_AB_SURBL
14 SARE_MLB_Stock1
15 STOCK_NAME_FVGT1
> Of course that 5% is very important because that is where I get the
> data for the other tests that allow me to bypass filtering.
Even this isn't necessarily so. Data for network tests can be collected
automatically, by trapping spammers who trawl the web/usenet for
addresses, those who scan for open port 25s, or those who try high MX's.
So at least some useful data can be collected without SA, or even human
intervention.
> But - I
> want you all to start thinking of a new way to look at spam
> filtering.
I'm not sure this is a "new way to look at spam filtering", but I agree
that content testing against regular expressions is increasingly looking
like a crude and easily-outwitted technique compared to dns tests. Bayes
is still good, though.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Magnus Holmgren <ho...@lysator.liu.se>.
On Thursday 13 July 2006 08:31, Sietse van Zanen took the opportunity to
write:
> And that trick could also very well cause you to loose legitimate
> e-mail......
As long as the senders' MTAs are RFC compliant nothing bad can happen unless
all real MXes go down, and in that case there is no difference between having
a fake MX and having no fake MX, whether the fake MX gives a temporary error
or doesn't respond at all. And even then you're not *losing* mail. Having mail
bounce back to the sender is not losing mail (although it can mean losing
business). Having mail disappear without any notification is losing mail.
> I don't think it's RFC compliant either.
The RFCs don't require 100% uptime. The RFCs don't say that you can't lie
about having a temporary error condition. It does say that sending hosts must
try all MXes in order.
> Somehow, this feels to me like throwing out your garbage on the street and
> then saying, Hey I got rid of it.....
Except that the garbage disappears and noone has to clean it up. It's more
like posting a sign saying "<- entrance through the next door" that makes
spammers go away.
--
Magnus Holmgren holmgren@lysator.liu.se
(No Cc of list mail needed, thanks)
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Bart Schaefer <ba...@gmail.com>.
On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
>
> Depends on what he's doing it might work.
He's writing procmail recipes. He's a user on a hosted shell server,
not a sysadmin. Strictly delivery-time header text analysis, no
MTA-level configuration games.
> For example, anyone can do this trick. Set your highest MX record
I'm amused by your definition of "anyone."
> (add a new one) to an IP address that doesn't exist.
We actually tried that (really, we set it to point to a virtual IP on
the same server that is the primary MX, so that one was only available
when the primary also was), and had a dummy port 25 listener on that
IP to 554 everything that connected. It stopped about 1% of our spam;
when we had to change hardware we didn't bother bringing it along. As
I recall it worked slightly better to make it the second MX rather
than the highest one.
We're wandering a bit off topic here, though.
RE: The best way to use Spamassassin is to not use Spamassassin
Posted by Sietse van Zanen <si...@wizdom.nu>.
And that trick could also very well cause you to loose legitimate e-mail......
I don't think it's RFC compliant either.
Somehow, this feels to me like throwing out your garbage on the street and then saying, Hey I got rid of it.....
-Sietse
________________________________
From: Marc Perkel [mailto:marc@perkel.com]
Sent: Thu 13-Jul-06 8:18
To: Bart Schaefer
Cc: users@spamassassin.apache.org
Subject: Re: The best way to use Spamassassin is to not use Spamassassin
Bart Schaefer wrote:
> On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
>>
>> Bart Schaefer wrote:
>> > There's been a fellow over on the procmail list claiming for well over
>> > a year now that he can get better accuracy than SA through message
>> > header analysis alone
>>
>> His claim might well be true.
>
> Oh, I have no doubt that he's speaking truthfully. Problem is that if
> no one else can look at what he's done, there's no way to confirm or
> deny my own suspicion, which is that most of his rules are only that
> accurate in his specific environment. That is, I tend to expect that
> if you picked up his rules and dropped them on another machine halfway
> around the world with a different ISP and mail routing chain, their
> accuracy would plummet.
>
Depends on what he's doing it might work. I catch most spam based on
sender behavior rather than message content. For example, anyone can do
this trick. Set your highest MX record (add a new one) to an IP address
that doesn't exist. Some spammers spam the highest MX first and it that
doesn't work the skip it and move on. I get rid of 120,000 spams a day
using that trick.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 12 Jul 2006, Marc Perkel wrote:
> Depends on what he's doing it might work. I catch most spam based on
> sender behavior rather than message content. For example, anyone can do
> this trick. Set your highest MX record (add a new one) to an IP address
> that doesn't exist. Some spammers spam the highest MX first and it that
> doesn't work the skip it and move on. I get rid of 120,000 spams a day
> using that trick.
Ooo. Set it to maila.microsoft.com... {evil grin}
--
John Hardin KA7OHZ ICQ#15735746 http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
11 days until The 37th anniversary of Apollo 11 landing on the Moon
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Marc Perkel <ma...@perkel.com>.
Bart Schaefer wrote:
> On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
>>
>> Bart Schaefer wrote:
>> > There's been a fellow over on the procmail list claiming for well over
>> > a year now that he can get better accuracy than SA through message
>> > header analysis alone
>>
>> His claim might well be true.
>
> Oh, I have no doubt that he's speaking truthfully. Problem is that if
> no one else can look at what he's done, there's no way to confirm or
> deny my own suspicion, which is that most of his rules are only that
> accurate in his specific environment. That is, I tend to expect that
> if you picked up his rules and dropped them on another machine halfway
> around the world with a different ISP and mail routing chain, their
> accuracy would plummet.
>
Depends on what he's doing it might work. I catch most spam based on
sender behavior rather than message content. For example, anyone can do
this trick. Set your highest MX record (add a new one) to an IP address
that doesn't exist. Some spammers spam the highest MX first and it that
doesn't work the skip it and move on. I get rid of 120,000 spams a day
using that trick.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Bart Schaefer <ba...@gmail.com>.
On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
>
> Bart Schaefer wrote:
> > There's been a fellow over on the procmail list claiming for well over
> > a year now that he can get better accuracy than SA through message
> > header analysis alone
>
> His claim might well be true.
Oh, I have no doubt that he's speaking truthfully. Problem is that if
no one else can look at what he's done, there's no way to confirm or
deny my own suspicion, which is that most of his rules are only that
accurate in his specific environment. That is, I tend to expect that
if you picked up his rules and dropped them on another machine halfway
around the world with a different ISP and mail routing chain, their
accuracy would plummet.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Marc Perkel <ma...@perkel.com>.
Bart Schaefer wrote:
> On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
>> Catchy subject line eh?
>
> What you really mean is "the best way to use SpamAssassin is as an
> analysis tool."
>
> Which of course is what the best way to use it always was. You're
> just abstracting the analysis rather than applying it directly.
>
>> The reaso [sic] of spam is rejected before I get to SA through
>> a fairly large number of tricks that allow me to determine with near
>> 100% accuracy things that are spam.
>
> There's been a fellow over on the procmail list claiming for well over
> a year now that he can get better accuracy than SA through message
> header analysis alone, based on rules he's compiled by analyzing what
> gets through the rules he already has. Just like you've done so far
> in this thread, though, all he'll do is claim that without providing
> any details -- which he says is because he doesn't want to give away
> all the hours of his work that went into it.
>
>> It is none mostly through behavior
>> and karma related lists. Being host blacklisted or URI blacklisted.
>>
>> Similarly, I have created a whitelisting system that tracks hosts and
>> other aspects of the message
>
> The trick, of course, is to be able to automatically feed back into
> these lists based on the output of the analysis tool. If someone has
> to do it by hand, it's a losing proposition.
>
His claim might well be true. I'm using Exim rules and processing 95%+
of all message before SA. I use SA for the rest. Of course I'm relying
on block lists that were created from people using SA. And the other up
side is that I can process 20 times as much email by avoiding SA.
Re: The best way to use Spamassassin is to not use Spamassassin
Posted by Bart Schaefer <ba...@gmail.com>.
On 7/12/06, Marc Perkel <ma...@perkel.com> wrote:
> Catchy subject line eh?
What you really mean is "the best way to use SpamAssassin is as an
analysis tool."
Which of course is what the best way to use it always was. You're
just abstracting the analysis rather than applying it directly.
> The reaso [sic] of spam is rejected before I get to SA through
> a fairly large number of tricks that allow me to determine with near
> 100% accuracy things that are spam.
There's been a fellow over on the procmail list claiming for well over
a year now that he can get better accuracy than SA through message
header analysis alone, based on rules he's compiled by analyzing what
gets through the rules he already has. Just like you've done so far
in this thread, though, all he'll do is claim that without providing
any details -- which he says is because he doesn't want to give away
all the hours of his work that went into it.
> It is none mostly through behavior
> and karma related lists. Being host blacklisted or URI blacklisted.
>
> Similarly, I have created a whitelisting system that tracks hosts and
> other aspects of the message
The trick, of course, is to be able to automatically feed back into
these lists based on the output of the analysis tool. If someone has
to do it by hand, it's a losing proposition.