You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Charles Gregory <cg...@hwcn.org> on 2009/12/18 22:20:14 UTC

Re: [sa] Re: Whitelists in SA

On Fri, 18 Dec 2009, LuKreme wrote:
> It's already been stayed no changes to 3.2.5 will be made until 3.3 is 
> done, hasn't it?

Well, at this point, I respectfully bow, and take a step back, so as not 
to sound too demanding of our great volunteers (smile), but I believe 
in another of my posts I put forward the idea that design, testnig and 
implementation of rules should be a bit more 'frequent', drawing upon 
the model of ClamAV, with signatures being frequently released, even 
while the next major 'engine' update is in the works.

I recognize, from the existence of such sites as 'rules du jour' that it 
has long been a practice for SA to release 'core' rule updates very 
infrequently. But with respect, I question whether that is still a good 
practice, particularly when an 'issue' raises concern over a particular 
set of scores, and it would *appear* that these updates require relatively 
little effort.

So, to put it bluntly, I don't see how a couple of rules changes are 
worthy of being 'held back' by the entire push to SA 3.3..... I would 
think that a few quick adjustments, and presumably a 'masscheck' would 
suffice, and new/revised rules could be released at least on a monthly 
basis without any serious concern for compromising the overall score 
balance that is the critical goal of SA updates?

Or am I grossly mis-estimating the work-load? :)

- C

Re: Whitelists in SA

Posted by Warren Togami <wt...@redhat.com>.
On 12/20/2009 09:20 AM, Charles Gregory wrote:
> On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>>> More unfortunately, privacy concerns prevent me from building a useful
>>> corpus of ham. Sigh....
>>> But otherwise such a good idea....
>> Can you not trust yourself to use your own ham? You don't need to
>> provide us with your mail. You can scan your own mail locally on your
>> own machine(s).
>
> I run an ISP. The corpus I would so love to build is the hundreds of
> messages per day that all our clients receive. It's *their* privacy that
> is the cocern.

Right, they would need to opt-in and the manual sorting requirements are 
a bit too difficult and time consuming for all but the most dedicated to 
this cause.

>
> Do you think that my own private collection of saved mail (perhaps 1100
> ham) would really be of benefit? I'd have to start saving my spam as
> well....

A Ham-only corpus is still useful, as long as it contains mail from a 
variety of sources.  (Mailing lists are not very useful.)

>
> And it would always be skewed by the fact that I SMTP reject anything
> caught by Zen.
>

Not a problem.

Warren

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Sun, 20 Dec 2009, jdow wrote:
> The downside is that this is not "confirmed ham" and "confirmed spam".

(nod) Exactly. And that is what is needed to do a masscheck...

> I wonder how much companies would pay for a part time SpamAssassin 
> honcho who can be trusted (bonded?) and can write SARE-ish rules 
> tailored to the company's email. Is there a job opportunity for somebody 
> here? (And, yes, I do suspect the burnout time would be rather short.)

(smile) I've got my own custom rule file format (plus a script to convert 
to standard SA rules format). This reduces the effort to add a new rule 
pretty much down to a cut-n-paste operation. Must admit there are some 
days when I do feel a bit burned out, but generally I am gratified to see 
my new rules trigger on the remainder of a spam flood.... :)

As for trust, I never need to see the ham, just the spam, which has no 
privacy issues (smile).

- C

Re: Whitelists in SA

Posted by John Hardin <jh...@impsec.org>.
On Sun, 20 Dec 2009, jdow wrote:

> I'm just a touch naive here; but, it seems to me it should be possible,
> somehow, to build running spamd daemons, one with the regular rules
> and one with the mass check rules.

There's nothing special about "masscheck rules". Masscheck is just running 
the current ruleset against hand-classified corpora (ideally _large_ 
hand-classified corpora) to see what hits.

> The second one is fed the email in parallel with the first but deletes 
> the mail once the scores are logged.

This can easily be done by analysis of spamd logs. It logs all the rules 
hit on every message scanned.

> The downside is that this is not "confirmed ham" and "confirmed spam".

That unfortunately is the critical part. You can easily glean whether or 
not SA thinks a message is "spammy" and what rules led to that 
classification, the tough part is confirming whether or not it's _right_.

> I wonder how much companies would pay for a part time SpamAssassin
> honcho who can be trusted (bonded?) and can write SARE-ish rules
> tailored to the company's email. Is there a job opportunity for
> somebody here?

I'd do that.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
 				           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  5 days until Christmas

Re: Whitelists in SA

Posted by jdow <jd...@earthlink.net>.
From: "Charles Gregory" <cg...@hwcn.org>
Sent: Sunday, 2009/December/20 06:20


> On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>>> More unfortunately, privacy concerns prevent me from building a useful
>>> corpus of ham. Sigh....
>>> But otherwise such a good idea....
>> Can you not trust yourself to use your own ham?  You don't need to
>> provide us with your mail.  You can scan your own mail locally on your
>> own machine(s).
> 
> I run an ISP. The corpus I would so love to build is the hundreds of 
> messages per day that all our clients receive. It's *their* privacy that
> is the cocern.
> 
> Do you think that my own private collection of saved mail (perhaps 1100 
> ham) would really be of benefit? I'd have to start saving my spam as 
> well....
> 
> And it would always be skewed by the fact that I SMTP reject anything 
> caught by Zen.

I'm just a touch naive here; but, it seems to me it should be possible,
somehow, to build running spamd daemons, one with the regular rules
and one with the mass check rules. The second one is fed the email in
parallel with the first but deletes the mail once the scores are logged.

The downside is that this is not "confirmed ham" and "confirmed spam".
It is a way to safely test new rule sets, though.

I must admit that the vast majority of email I receive is not hand
checked for ham/spam. I simply read headers on several lists to see
what the current buzz is. I read threads that look interesting and
toss the rest. So it'd be hard to mass check validly with that as a
corpus. (Besides, I suspect animal husbandry companies would hardly
be interested in passing things that look like typical LKML mailings,
would they?)

I wonder how much companies would pay for a part time SpamAssassin
honcho who can be trusted (bonded?) and can write SARE-ish rules
tailored to the company's email. Is there a job opportunity for
somebody here? (And, yes, I do suspect the burnout time would be
rather short.)

{^_^}

Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
>> More unfortunately, privacy concerns prevent me from building a useful
>> corpus of ham. Sigh....
>> But otherwise such a good idea....
> Can you not trust yourself to use your own ham?  You don't need to
> provide us with your mail.  You can scan your own mail locally on your
> own machine(s).

I run an ISP. The corpus I would so love to build is the hundreds of 
messages per day that all our clients receive. It's *their* privacy that
is the cocern.

Do you think that my own private collection of saved mail (perhaps 1100 
ham) would really be of benefit? I'd have to start saving my spam as 
well....

And it would always be skewed by the fact that I SMTP reject anything 
caught by Zen.

- C

Re: [sa] Re: Whitelists in SA

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 19/12/2009 5:51 PM, Charles Gregory wrote:
> On Fri, 18 Dec 2009, Warren Togami wrote:
>> Why wait, when you do relatively simple things to help make it happen?
>> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> We can more frequently update rules if more people participate in the
>> nightly masschecks.  The current documentation is a bit of a confusing
>> mess unfortunately.
> 
> More unfortunately, privacy concerns prevent me from building a useful
> corpus of ham. Sigh....
> 
> But otherwise such a good idea....

Can you not trust yourself to use your own ham?  You don't need to
provide us with your mail.  You can scan your own mail locally on your
own machine(s).

Daryl



Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, Warren Togami wrote:
> Why wait, when you do relatively simple things to help make it happen?
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> We can more frequently update rules if more people participate in the 
> nightly masschecks.  The current documentation is a bit of a confusing mess 
> unfortunately.

More unfortunately, privacy concerns prevent me from building a useful 
corpus of ham. Sigh....

But otherwise such a good idea....

- C



Re: [sa] Re: Whitelists in SA

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 18/12/2009 5:13 PM, Warren Togami wrote:
> On 12/18/2009 04:56 PM, Charles Gregory wrote:
>> On Fri, 18 Dec 2009, John Hardin wrote:
>>> We hope to get rule scoring and publication much more automated -
>>> i.e., if a rule in the sandbox works well based on the automated
>>> masschecks, it would be automatically scored and published via
>>> sa-update.
>>
>> Music to my ears. I will wait (semi-)patiently. Thanks.
>>
>> - C
> 
> Why wait, when you do relatively simple things to help make it happen?
> 
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> We can more frequently update rules if more people participate in the
> nightly masschecks.  The current documentation is a bit of a confusing
> mess unfortunately.

Exactly!  We have code to do this now.  But I'm positive that we don't
have a large and diverse enough ham corpus (on a daily basis, not the
big turn out for the "legacy" re-score mass-checks) to trust it.

Contributors are always welcome!

Daryl


Re: [sa] Re: Whitelists in SA

Posted by Warren Togami <wt...@redhat.com>.
On 12/18/2009 04:56 PM, Charles Gregory wrote:
> On Fri, 18 Dec 2009, John Hardin wrote:
>> We hope to get rule scoring and publication much more automated -
>> i.e., if a rule in the sandbox works well based on the automated
>> masschecks, it would be automatically scored and published via sa-update.
>
> Music to my ears. I will wait (semi-)patiently. Thanks.
>
> - C

Why wait, when you do relatively simple things to help make it happen?

http://wiki.apache.org/spamassassin/NightlyMassCheck
We can more frequently update rules if more people participate in the 
nightly masschecks.  The current documentation is a bit of a confusing 
mess unfortunately.

Warren

Re: [sa] Re: Whitelists in SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 18 Dec 2009, John Hardin wrote:
> We hope to get rule scoring and publication much more automated - i.e., 
> if a rule in the sandbox works well based on the automated masschecks, 
> it would be automatically scored and published via sa-update.

Music to my ears. I will wait (semi-)patiently. Thanks.

- C

Re: [sa] Re: Whitelists in SA

Posted by John Hardin <jh...@impsec.org>.
On Fri, 18 Dec 2009, Charles Gregory wrote:

> I recognize, from the existence of such sites as 'rules du jour' that it 
> has long been a practice for SA to release 'core' rule updates very 
> infrequently. But with respect, I question whether that is still a good 
> practice, particularly when an 'issue' raises concern over a particular 
> set of scores, and it would *appear* that these updates require 
> relatively little effort.

We hope to get rule scoring and publication much more automated - i.e., if 
a rule in the sandbox works well based on the automated masschecks, it 
would be automatically scored and published via sa-update.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
 				           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  7 days until Christmas