You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2009/12/16 16:03:57 UTC

Whitelists in SA


Res wrote:
>
> no whitelist should ever become default part of SA
>
> the day it is, is the day I look elsewhere.
>
>

Why shouldn't white lists become part of SA? Blacklists are part of SA. 
My hostkarma whitelists are one of the things that keeps me in business 
because my false positive rates are far far better than SA because of 
white listing. There are millions of email servers out there that do 
nothing but send good email 100% of the time that are easy to detect 
because, unlike spammers, they aren't trying to be evasive. I continue 
to be of the opinion that SA need more white rules to detect HAM and not 
just SPAM.

Re: Whitelists in SA

Posted by John Hardin <jh...@impsec.org>.
On Sun, 20 Dec 2009, jdow wrote:

> I'm just a touch naive here; but, it seems to me it should be possible,
> somehow, to build running spamd daemons, one with the regular rules
> and one with the mass check rules.

There's nothing special about "masscheck rules". Masscheck is just running 
the current ruleset against hand-classified corpora (ideally _large_ 
hand-classified corpora) to see what hits.

> The second one is fed the email in parallel with the first but deletes 
> the mail once the scores are logged.

This can easily be done by analysis of spamd logs. It logs all the rules 
hit on every message scanned.

> The downside is that this is not "confirmed ham" and "confirmed spam".

That unfortunately is the critical part. You can easily glean whether or 
not SA thinks a message is "spammy" and what rules led to that 
classification, the tough part is confirming whether or not it's _right_.

> I wonder how much companies would pay for a part time SpamAssassin
> honcho who can be trusted (bonded?) and can write SARE-ish rules
> tailored to the company's email. Is there a job opportunity for
> somebody here?

I'd do that.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
 				           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  5 days until Christmas

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Sun, 20 Dec 2009, jdow wrote:
> The downside is that this is not "confirmed ham" and "confirmed spam".

(nod) Exactly. And that is what is needed to do a masscheck...

> I wonder how much companies would pay for a part time SpamAssassin 
> honcho who can be trusted (bonded?) and can write SARE-ish rules 
> tailored to the company's email. Is there a job opportunity for somebody 
> here? (And, yes, I do suspect the burnout time would be rather short.)

(smile) I've got my own custom rule file format (plus a script to convert 
to standard SA rules format). This reduces the effort to add a new rule 
pretty much down to a cut-n-paste operation. Must admit there are some 
days when I do feel a bit burned out, but generally I am gratified to see 
my new rules trigger on the remainder of a spam flood.... :)

As for trust, I never need to see the ham, just the spam, which has no 
privacy issues (smile).

- C

Re: Whitelists in SA

Posted by Warren Togami <wt...@redhat.com>.
On 12/20/2009 09:20 AM, Charles Gregory wrote:
> On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>>> More unfortunately, privacy concerns prevent me from building a useful
>>> corpus of ham. Sigh....
>>> But otherwise such a good idea....
>> Can you not trust yourself to use your own ham? You don't need to
>> provide us with your mail. You can scan your own mail locally on your
>> own machine(s).
>
> I run an ISP. The corpus I would so love to build is the hundreds of
> messages per day that all our clients receive. It's *their* privacy that
> is the cocern.

Right, they would need to opt-in and the manual sorting requirements are 
a bit too difficult and time consuming for all but the most dedicated to 
this cause.

>
> Do you think that my own private collection of saved mail (perhaps 1100
> ham) would really be of benefit? I'd have to start saving my spam as
> well....

A Ham-only corpus is still useful, as long as it contains mail from a 
variety of sources.  (Mailing lists are not very useful.)

>
> And it would always be skewed by the fact that I SMTP reject anything
> caught by Zen.
>

Not a problem.

Warren

Re: Whitelists in SA

Posted by John Hardin <jh...@impsec.org>.
On Fri, 18 Dec 2009, Charles Gregory wrote:

> On Thu, 17 Dec 2009, jdow wrote:
>>  It is a good thing this issue was raised. It led to appropriate mass
>>  check runs. I expect that will lead to saner scoring within the SA
>>  framework. If not and it bites me, THEN I'll raise the issue again.
>>  Does that seem fair?
>
> 50_scores.cf:score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0
> 50_scores.cf:score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3
> 50_scores.cf:score HABEAS_CHECKED 0 -0.2 0 -0.2
>
> Still no changes through the sa-update channel.

There won't be until after 3.3.0 ships. Then changes to 3.2.x (including a 
possible 3.2.6 release) will be considered.

As far as I know rule promotion and rescoring are not automatic for 3.2.x, 
it's still a manual process. All of the focus right now is on getting 
3.3.0 out.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
 				           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  7 days until Christmas

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, jdow wrote:
>>  Perhaps you meant CHAIR and keyboard? ;)
> I should have guessed you've managed to short circuit the path
> through your brain.
> {O,o}   <-- Grinning, ducking, and running REAL fast that way>>>>>>>>>
> (Thanks for the straight line. {^_-})

(Thinks twice about it)

Ouch. Subtle. I like it. :)

- Charles

Re: [sa] Re: Whitelists in SA

Posted by jdow <jd...@earthlink.net>.
From: "Charles Gregory" <cg...@hwcn.org>
Sent: Friday, 2009/December/18 13:49


> On Fri, 18 Dec 2009, jdow wrote:
>>>  On Thu, 17 Dec 2009, jdow wrote:
>>>  Still no changes through the sa-update channel.
>>>  Is there a time delay in the masscheck results being applied?
>> Yes, there is, Mr. Gregory. It exists between your monitor and your
>> keyboard.
> 
> There is a one inch gap between those two.
> 
> Perhaps you meant CHAIR and keyboard? ;)

I should have guessed you've managed to short circuit the path
through your brain.

{O,o}   <-- Grinning, ducking, and running REAL fast that way>>>>>>>>>

(Thanks for the straight line. {^_-})

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, jdow wrote:
>>  On Thu, 17 Dec 2009, jdow wrote:
>>  Still no changes through the sa-update channel.
>>  Is there a time delay in the masscheck results being applied?
> Yes, there is, Mr. Gregory. It exists between your monitor and your
> keyboard.

There is a one inch gap between those two.

Perhaps you meant CHAIR and keyboard? ;)

- C

Re: Whitelists in SA

Posted by jdow <jd...@earthlink.net>.
From: "Charles Gregory" <cg...@hwcn.org>
Sent: Sunday, 2009/December/20 06:20


> On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>>> More unfortunately, privacy concerns prevent me from building a useful
>>> corpus of ham. Sigh....
>>> But otherwise such a good idea....
>> Can you not trust yourself to use your own ham?  You don't need to
>> provide us with your mail.  You can scan your own mail locally on your
>> own machine(s).
> 
> I run an ISP. The corpus I would so love to build is the hundreds of 
> messages per day that all our clients receive. It's *their* privacy that
> is the cocern.
> 
> Do you think that my own private collection of saved mail (perhaps 1100 
> ham) would really be of benefit? I'd have to start saving my spam as 
> well....
> 
> And it would always be skewed by the fact that I SMTP reject anything 
> caught by Zen.

I'm just a touch naive here; but, it seems to me it should be possible,
somehow, to build running spamd daemons, one with the regular rules
and one with the mass check rules. The second one is fed the email in
parallel with the first but deletes the mail once the scores are logged.

The downside is that this is not "confirmed ham" and "confirmed spam".
It is a way to safely test new rule sets, though.

I must admit that the vast majority of email I receive is not hand
checked for ham/spam. I simply read headers on several lists to see
what the current buzz is. I read threads that look interesting and
toss the rest. So it'd be hard to mass check validly with that as a
corpus. (Besides, I suspect animal husbandry companies would hardly
be interested in passing things that look like typical LKML mailings,
would they?)

I wonder how much companies would pay for a part time SpamAssassin
honcho who can be trusted (bonded?) and can write SARE-ish rules
tailored to the company's email. Is there a job opportunity for
somebody here? (And, yes, I do suspect the burnout time would be
rather short.)

{^_^}

Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>> More unfortunately, privacy concerns prevent me from building a useful
>> corpus of ham. Sigh....
>> But otherwise such a good idea....
> Can you not trust yourself to use your own ham?  You don't need to
> provide us with your mail.  You can scan your own mail locally on your
> own machine(s).

I run an ISP. The corpus I would so love to build is the hundreds of 
messages per day that all our clients receive. It's *their* privacy that
is the cocern.

Do you think that my own private collection of saved mail (perhaps 1100 
ham) would really be of benefit? I'd have to start saving my spam as 
well....

And it would always be skewed by the fact that I SMTP reject anything 
caught by Zen.

- C

Re: Whitelists in SA

Posted by jdow <jd...@earthlink.net>.
From: "Charles Gregory" <cg...@hwcn.org>
Sent: Friday, 2009/December/18 06:56


> On Thu, 17 Dec 2009, jdow wrote:
>> It is a good thing this issue was raised. It led to appropriate mass
>> check runs. I expect that will lead to saner scoring within the SA
>> framework. If not and it bites me, THEN I'll raise the issue again.
>> Does that seem fair?
> 
> 50_scores.cf:score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0
> 50_scores.cf:score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3
> 50_scores.cf:score HABEAS_CHECKED 0 -0.2 0 -0.2
> 
> Still no changes through the sa-update channel.
> Is there a time delay in the masscheck results being applied?
> 
> - Charles

Yes, there is, Mr. Gregory. It exists between your monitor and your
keyboard.

{^_^}

Re: [sa] Re: Whitelists in SA

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 19/12/2009 5:51 PM, Charles Gregory wrote:
> On Fri, 18 Dec 2009, Warren Togami wrote:
>> Why wait, when you do relatively simple things to help make it happen?
>> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> We can more frequently update rules if more people participate in the
>> nightly masschecks.  The current documentation is a bit of a confusing
>> mess unfortunately.
> 
> More unfortunately, privacy concerns prevent me from building a useful
> corpus of ham. Sigh....
> 
> But otherwise such a good idea....

Can you not trust yourself to use your own ham?  You don't need to
provide us with your mail.  You can scan your own mail locally on your
own machine(s).

Daryl



Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, Warren Togami wrote:
> Why wait, when you do relatively simple things to help make it happen?
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> We can more frequently update rules if more people participate in the 
> nightly masschecks.  The current documentation is a bit of a confusing mess 
> unfortunately.

More unfortunately, privacy concerns prevent me from building a useful 
corpus of ham. Sigh....

But otherwise such a good idea....

- C



Re: [sa] Re: Whitelists in SA

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 18/12/2009 5:13 PM, Warren Togami wrote:
> On 12/18/2009 04:56 PM, Charles Gregory wrote:
>> On Fri, 18 Dec 2009, John Hardin wrote:
>>> We hope to get rule scoring and publication much more automated -
>>> i.e., if a rule in the sandbox works well based on the automated
>>> masschecks, it would be automatically scored and published via
>>> sa-update.
>>
>> Music to my ears. I will wait (semi-)patiently. Thanks.
>>
>> - C
> 
> Why wait, when you do relatively simple things to help make it happen?
> 
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> We can more frequently update rules if more people participate in the
> nightly masschecks.  The current documentation is a bit of a confusing
> mess unfortunately.

Exactly!  We have code to do this now.  But I'm positive that we don't
have a large and diverse enough ham corpus (on a daily basis, not the
big turn out for the "legacy" re-score mass-checks) to trust it.

Contributors are always welcome!

Daryl


Re: [sa] Re: Whitelists in SA

Posted by Warren Togami <wt...@redhat.com>.
On 12/18/2009 04:56 PM, Charles Gregory wrote:
> On Fri, 18 Dec 2009, John Hardin wrote:
>> We hope to get rule scoring and publication much more automated -
>> i.e., if a rule in the sandbox works well based on the automated
>> masschecks, it would be automatically scored and published via sa-update.
>
> Music to my ears. I will wait (semi-)patiently. Thanks.
>
> - C

Why wait, when you do relatively simple things to help make it happen?

http://wiki.apache.org/spamassassin/NightlyMassCheck
We can more frequently update rules if more people participate in the 
nightly masschecks.  The current documentation is a bit of a confusing 
mess unfortunately.

Warren

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, John Hardin wrote:
> We hope to get rule scoring and publication much more automated - i.e., 
> if a rule in the sandbox works well based on the automated masschecks, 
> it would be automatically scored and published via sa-update.

Music to my ears. I will wait (semi-)patiently. Thanks.

- C

Re: [sa] Re: Whitelists in SA

Posted by John Hardin <jh...@impsec.org>.
On Fri, 18 Dec 2009, Charles Gregory wrote:

> I recognize, from the existence of such sites as 'rules du jour' that it 
> has long been a practice for SA to release 'core' rule updates very 
> infrequently. But with respect, I question whether that is still a good 
> practice, particularly when an 'issue' raises concern over a particular 
> set of scores, and it would *appear* that these updates require 
> relatively little effort.

We hope to get rule scoring and publication much more automated - i.e., if 
a rule in the sandbox works well based on the automated masschecks, it 
would be automatically scored and published via sa-update.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
 				           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  7 days until Christmas

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, LuKreme wrote:
> It's already been stayed no changes to 3.2.5 will be made until 3.3 is 
> done, hasn't it?

Well, at this point, I respectfully bow, and take a step back, so as not 
to sound too demanding of our great volunteers (smile), but I believe 
in another of my posts I put forward the idea that design, testnig and 
implementation of rules should be a bit more 'frequent', drawing upon 
the model of ClamAV, with signatures being frequently released, even 
while the next major 'engine' update is in the works.

I recognize, from the existence of such sites as 'rules du jour' that it 
has long been a practice for SA to release 'core' rule updates very 
infrequently. But with respect, I question whether that is still a good 
practice, particularly when an 'issue' raises concern over a particular 
set of scores, and it would *appear* that these updates require relatively 
little effort.

So, to put it bluntly, I don't see how a couple of rules changes are 
worthy of being 'held back' by the entire push to SA 3.3..... I would 
think that a few quick adjustments, and presumably a 'masscheck' would 
suffice, and new/revised rules could be released at least on a monthly 
basis without any serious concern for compromising the overall score 
balance that is the critical goal of SA updates?

Or am I grossly mis-estimating the work-load? :)

- C

Re: Whitelists in SA

Posted by LuKreme <kr...@kreme.com>.
On Dec 18, 2009, at 7:56, Charles Gregory <cg...@hwcn.org> wrote:
> Still no changes through the sa-update channel.
> Is there a time delay in the masscheck results being applied?

It's already been stayed no changes to 3.2.5 will be made until 3.3 is  
done, hasn't it?


Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Thu, 17 Dec 2009, jdow wrote:
> It is a good thing this issue was raised. It led to appropriate mass
> check runs. I expect that will lead to saner scoring within the SA
> framework. If not and it bites me, THEN I'll raise the issue again.
> Does that seem fair?

50_scores.cf:score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0
50_scores.cf:score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3
50_scores.cf:score HABEAS_CHECKED 0 -0.2 0 -0.2

Still no changes through the sa-update channel.
Is there a time delay in the masscheck results being applied?

- Charles

Re: Whitelists in SA

Posted by jdow <jd...@earthlink.net>.
From: "J.D. Falk" <jd...@cybernothing.org>
Sent: Thursday, 2009/December/17 11:21


On Dec 16, 2009, at 8:35 AM, LuKreme wrote:

> The fact is I *AM* their customer. The people writing them checks are not, 
> they're just their funders. Whitelist companies ha to convince admins to 
> use their list. The only way to do that is to have really really really 
> high quality lists that really do prevent spam delivery. If I don't use 
> their whitelist, and others don't use their whitelist, then their model 
> falls apart and they don't make money

Exactly what Return path has been saying (and acting upon) for years.

(We could debate whether Habeas followed that rule before we bought the 
company, but it's impolite to speak ill of the dead.)

> but no company is enlightened enough to realise this.

Heh.

<<jdow    Lukreme seems to not have much of an engineering education
and zero experience with statistics. It is statistically impossible
to remove all spam perfectly and let all ham through perfectly. Perfect
is a goal you can never reach. If you obsess about it, you will find
yourself "round the bend" before long. All you can do is adjust the
ratio of missed ham to missed spam one way or the other. Where you
"slice" is pretty much up to you. What is the cost, the real cost in
lost customers or dollars spent, for a missed ham and for a missed
spam. If you can hit that balance point for minimum overall cost you've
done your job. If you sit and bitch about something not being perfect,
then you're not doing your job.

It is a good thing this issue was raised. It led to appropriate mass
check runs. I expect that will lead to saner scoring within the SA
framework. If not and it bites me, THEN I'll raise the issue again.
Does that seem fair?

{^_^} 


Re: Whitelists in SA

Posted by "J.D. Falk" <jd...@cybernothing.org>.
On Dec 16, 2009, at 8:35 AM, LuKreme wrote:

> The fact is I *AM* their customer. The people writing them checks are not, they're just their funders. Whitelist companies ha to convince admins to use their list. The only way to do that is to have really really really high quality lists that really do prevent spam delivery. If I don't use their whitelist, and others don't use their whitelist, then their model falls apart and they don't make money

Exactly what Return path has been saying (and acting upon) for years.

(We could debate whether Habeas followed that rule before we bought the company, but it's impolite to speak ill of the dead.)

> but no company is enlightened enough to realise this.

Heh.

--
J.D. Falk <jd...@returnpath.net>
Return Path Inc





Re: Whitelists in SA

Posted by LuKreme <kr...@kreme.com>.
On 16-Dec-2009, at 08:03, Marc Perkel wrote:
> Res wrote:
>> 
>> no whitelist should ever become default part of SA
>> 
>> the day it is, is the day I look elsewhere.
> 
> Why shouldn't white lists become part of SA? Blacklists are part of SA. My hostkarma whitelists are one of the things that keeps me in business because my false positive rates are far far better than SA because of white listing. There are millions of email servers out there that do nothing but send good email 100% of the time that are easy to detect because, unlike spammers, they aren't trying to be evasive. I continue to be of the opinion that SA need more white rules to detect HAM and not just SPAM.

I would say that no COMMERCIAL whitelist should be part of SA. I use whitelisting myself, but I'm not going to trust someone who was a financial interest in getting mail delivered to me to be diligent in their whitelisting. After all, their bean-counters don't see me as the customer because I'm not writing them checks.

The fact is I *AM* their customer. The people writing them checks are not, they're just their funders. Whitelist companies ha to convince admins to use their list. The only way to do that is to have really really really high quality lists that really do prevent spam delivery. If I don't use their whitelist, and others don't use their whitelist, then their model falls apart and they don't make money, but no company is enlightened enough to realise this.


-- 
They say only the good die young. If it works the other way too 
	I'm immortal