You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Webmaster <we...@hostation.com> on 2006/04/25 19:30:47 UTC

more questions on training spamassassin

In my setup, the server running spamassassin is different than the server
delivering the final e-mail.  This means a few extra headers will be added
by the time the clients see the e-mail.  If I were to take this e-mail and
train spamassassin, it is no longer in the form that spamassassin sees
originally (in terms of headers). 

So my question is, is it even worthwhile to train spamassassin manually in
this scenario ?

Re: having trouble with SA

Posted by Matt Kettler <mk...@comcast.net>.

Bill Landry wrote:
>
> So, Matt, are you doing something like:
>
>    bayes_ignore_from *@spamassassin.apache.org
I can check the exact syntax, but yes.
>
> How does this compare to:
>
>    bayes_ignore_to users@spamassassin.apache.org
>
> Is one way preferable to the other?
The bayes_ignore_to will fail for:
    messages sent to spamassassin-users@incubator.apache.org (still works)
    messages bcc'ed to the list for some reason or another.

However, the bayes_ignore_to will catch messages sent directly to you
and Cc'ed to the list, which is a good thing.

Ideally you should use both. (which I do, along with a whitelist_from_spf)

>
> Bill
>

Re: having trouble with SA

Posted by Bill Landry <bi...@pointshare.com>.

----- Original Message ----- 
From: "Matt Kettler" <mk...@evi-inc.com>

>>> No.. the only thing I generally whitelist is spam discussion lists
>>> like this one
>>> (and I do bayes_ignore_from for them as well). It would be better to
>>> bypass SA
>>> entirely, but I don't have that option in my setup.
>>
>> Matt, just for clarification, shouldn't that be bayes_ignore_to instead
>> of "from" when talking about discussion lists?  For example:
>>
>> bayes_ignore_to users@spamassassin.apache.org
>>
>
> No.. it should be from. I'm matching against the Return-Path header, not 
> the To:
> header.

So, Matt, are you doing something like:

    bayes_ignore_from *@spamassassin.apache.org

How does this compare to:

    bayes_ignore_to users@spamassassin.apache.org

Is one way preferable to the other?

Bill

Re: having trouble with SA

Posted by Matt Kettler <mk...@evi-inc.com>.

Bill Landry wrote:
> ----- Original Message ----- From: "Matt Kettler" <mk...@evi-inc.com>
> 
>>> Final question for the moment... our old local.cf file had a lengthy
>>> whitelist included.   Is there any reason necessarily to have a
>>> whitelist?
>>
>> No.. the only thing I generally whitelist is spam discussion lists
>> like this one
>> (and I do bayes_ignore_from for them as well). It would be better to
>> bypass SA
>> entirely, but I don't have that option in my setup.
> 
> Matt, just for clarification, shouldn't that be bayes_ignore_to instead
> of "from" when talking about discussion lists?  For example:
> 
> bayes_ignore_to users@spamassassin.apache.org
> 

No.. it should be from. I'm matching against the Return-Path header, not the To:
header.

Re: having trouble with SA

Posted by Bill Landry <bi...@pointshare.com>.

----- Original Message ----- 
From: "Matt Kettler" <mk...@evi-inc.com>

>> Final question for the moment... our old local.cf file had a lengthy
>> whitelist included.   Is there any reason necessarily to have a
>> whitelist?
>
> No.. the only thing I generally whitelist is spam discussion lists like 
> this one
> (and I do bayes_ignore_from for them as well). It would be better to 
> bypass SA
> entirely, but I don't have that option in my setup.

Matt, just for clarification, shouldn't that be bayes_ignore_to instead of 
"from" when talking about discussion lists?  For example:

bayes_ignore_to users@spamassassin.apache.org

Bill

Re: having trouble with SA

Posted by Matt Kettler <mk...@evi-inc.com>.

Jeff Portwine wrote:
>   Is there any good method
> for users to submit email as spam when spam gets through to help SA
> learn it as spam?

Yes, that can be a tough one. If you have a standardized mail client, you might
look and see if it has a reasonable "redirect" or "bounce" feature that
preserves the headers.

Another option would be to have them forward the original message as an
attachment and have a script strip the attachments and feed those to sa-learn.

> 
> Final question for the moment... our old local.cf file had a lengthy
> whitelist included.   Is there any reason necessarily to have a
> whitelist? 

No.. the only thing I generally whitelist is spam discussion lists like this one
(and I do bayes_ignore_from for them as well). It would be better to bypass SA
entirely, but I don't have that option in my setup.

> Since i'm training SA with ham, most of which would be coming
> from  our servers,  that mail should be processed as ham anyway.   Or
> does the whitelist just help to give more security that mail isn't going
> to be marked as spam erroneously?

Well, it would add extra security against FPs. However, for your internal mail
you should also get ALL_TRUSTED firing off. (note: you may want to manually
configure trusted_networks.. SA's guessing is not 100% here)

> 
> Thanks again,
> Jeff
>

Re: having trouble with SA

Posted by Jeff Portwine <jd...@veritime.com>.

----- Original Message ----- 
From: "Matt Kettler" <mk...@evi-inc.com>
To: "Jeff Portwine" <jd...@veritime.com>
Cc: <us...@spamassassin.apache.org>
Sent: Tuesday, April 25, 2006 3:38 PM
Subject: Re: having trouble with SA

> Jeff Portwine wrote:
>
>> The spam levels are getting high again, users are complaining, and so
>> today I did an apt-get spamassassin to upgrade to version 3.1.0.      I
>> then used the configuration tool at
>> http://www.yrex.com/spam/spamconfig.php to create a new local.cf and
>> replaced the old one, which was outdated even for our previous
>> version.     Now however, when I try to start he spamassassin daemon I
>> get the message:   SpamAssassin Mail Filter Daemon: disabled, see
>> /etc/default/spamassassin   and I'm really not sure what's wrong there.
>
> So what does /etc/default/spamassassin look like? My guess is this file is 
> a
> debian-specific file that configures the startup script, and it's probably 
> set
> to disable spamd. However, I'm not a debian user, so it's a guess, but it 
> would
> be helpful to see what's there.
>
>
>
> Also, have you run spamassassin --lint? This checks your config files for
> errors. It should run with no output at all, but if there are problems it 
> will
> complain.
>

You were right about /etc/default/spamassassin.   I looked at it earlier but 
I guess my head was cloudy from the other stuff I'd been looking at because 
the answer to that particular problem was obvious and when I looked again it 
was clear why it was disabling spamd.

So now that I have that running... i'm currently digging through my exim 
config to try to verify that SA is configured properly there.     Once I can 
determine that, the next order is to rebuild my bayes database.    At that 
point I have some questions though.     Is there any good method for users 
to submit email as spam when spam gets through to help SA learn it as spam? 
Currently, mail is received by exim, and it is passed through SA and tagged 
spam or left alone and placed in the users mail box and they retrieve their 
mail via POP3.   Having them forward spam doesn't work since all the headers 
get re-written.    I can't seem to come up with a good way to do this other 
than asking them to manually copy spam into a folder or something where I 
could have a script learn the spam, but getting our users to take the time 
to do that would be a battle.

Final question for the moment... our old local.cf file had a lengthy 
whitelist included.   Is there any reason necessarily to have a whitelist? 
Since i'm training SA with ham, most of which would be coming from  our 
servers,  that mail should be processed as ham anyway.   Or does the 
whitelist just help to give more security that mail isn't going to be marked 
as spam erroneously?

Thanks again,
Jeff

Re: having trouble with SA

Posted by Stuart Johnston <st...@ebby.com>.

Matt Kettler wrote:
> Jeff Portwine wrote:
> 
>> The spam levels are getting high again, users are complaining, and so
>> today I did an apt-get spamassassin to upgrade to version 3.1.0.      I
>> then used the configuration tool at
>> http://www.yrex.com/spam/spamconfig.php to create a new local.cf and
>> replaced the old one, which was outdated even for our previous
>> version.     Now however, when I try to start he spamassassin daemon I
>> get the message:   SpamAssassin Mail Filter Daemon: disabled, see
>> /etc/default/spamassassin   and I'm really not sure what's wrong there.
> 
> So what does /etc/default/spamassassin look like? My guess is this file is a
> debian-specific file that configures the startup script, and it's probably set
> to disable spamd. However, I'm not a debian user, so it's a guess, but it would
> be helpful to see what's there.

Yes, Matt is right.  There is a line that says 'ENABLED=0'.  Change that 
0 to 1 and it will work.  You can also set options such as max-children 
in this file.

-Stuart

Re: having trouble with SA

Posted by Matt Kettler <mk...@evi-inc.com>.

Jeff Portwine wrote:

> The spam levels are getting high again, users are complaining, and so
> today I did an apt-get spamassassin to upgrade to version 3.1.0.      I
> then used the configuration tool at
> http://www.yrex.com/spam/spamconfig.php to create a new local.cf and
> replaced the old one, which was outdated even for our previous
> version.     Now however, when I try to start he spamassassin daemon I
> get the message:   SpamAssassin Mail Filter Daemon: disabled, see
> /etc/default/spamassassin   and I'm really not sure what's wrong there.

So what does /etc/default/spamassassin look like? My guess is this file is a
debian-specific file that configures the startup script, and it's probably set
to disable spamd. However, I'm not a debian user, so it's a guess, but it would
be helpful to see what's there.

Also, have you run spamassassin --lint? This checks your config files for
errors. It should run with no output at all, but if there are problems it will
complain.

having trouble with SA

Posted by Jeff Portwine <jd...@veritime.com>.

I am running exim 3.35 in debian.    We were using spamassassin 3.0, but we 
have been having a lot of trouble with spam getting through.    Some gets 
caught but a lot doesn't and over time it gets worse and worse.     The 
person who originally set up our mailserver and spamassassin left the 
company a while ago and it's been nothing but trouble since then.   Part of 
the reason from what I've been able to gather is that the bayes database 
keeps breaking itself.    I cleared the database before, and retrained it 
with a bunch of spam and ham and it seemed somewhat improved for a while. 
However, the way our system was set up was that whenever anybody got spam 
they would forward it to a spam email address and sa-learn would 
automatically learn from the mail sent there.    I learned that this is an 
ineffective way to handle this because the headers all get re-written when 
users forward their spam, in addition to the fact that over time the 
database gets very little ham and tons of spam and eventually the database 
gets more and more ineffective.

The spam levels are getting high again, users are complaining, and so today 
I did an apt-get spamassassin to upgrade to version 3.1.0.      I then used 
the configuration tool at http://www.yrex.com/spam/spamconfig.php to create 
a new local.cf and replaced the old one, which was outdated even for our 
previous version.     Now however, when I try to start he spamassassin 
daemon I get the message:   SpamAssassin Mail Filter Daemon: disabled, see 
/etc/default/spamassassin   and I'm really not sure what's wrong there.

As you can tell i'm a complete SA newbie and my exim experience is somewhat 
limited as well so I'm pretty much starting at the bottom of the learning 
curve.   I haven't been able to find any very complete or concise 
information about SA on the net, even the SA web page has a lot of scattered 
and outdated information so I'm not sure where to go from here to get this 
working.   Any advice would be very very much appreciated.

Thanks!
-Jeff

RE: more questions on training spamassassin

Posted by Webmaster <we...@hostation.com>.

 

> -----Original Message-----
> From: Matt Kettler [mailto:mkettler@evi-inc.com] 
> Sent: April 25, 2006 11:30 AM
> To: webmaster@hostation.com
> Cc: users@spamassassin.apache.org
> Subject: Re: more questions on training spamassassin
> 
> Webmaster wrote:
> > In my setup, the server running spamassassin is different than the 
> > server delivering the final e-mail.  This means a few extra headers 
> > will be added by the time the clients see the e-mail.  If I were to 
> > take this e-mail and train spamassassin, it is no longer in 
> the form 
> > that spamassassin sees originally (in terms of headers).
> > 
> > So my question is, is it even worthwhile to train spamassassin 
> > manually in this scenario ?
> 
> Depends on how much the "few extra headers" is.. First, any 
> extra Received:
> headers are negligible.. You don't need to be that pristine.
> 
> A few extra status headers won't hurt much, and you can use 
> bayes_ignore_header to have SA ignore them.
> 
> 
> The big issues to look out for is mass-removal of headers and 
> body formatting, like what happens when you forward a 
> message. Also beware of server/client back-ends that rip out 
> mime sections it feels are unimportant. (It's quite common 
> for spammers to "hide" things in the text/plain of a 
> multipart/alternative message).
> 
> 

ok, that's what I thought.
Can't help it. I am a pristine kind a guy :-)

Thanks.

Re: more questions on training spamassassin

Posted by Matt Kettler <mk...@evi-inc.com>.

Webmaster wrote:
> In my setup, the server running spamassassin is different than the server
> delivering the final e-mail.  This means a few extra headers will be added
> by the time the clients see the e-mail.  If I were to take this e-mail and
> train spamassassin, it is no longer in the form that spamassassin sees
> originally (in terms of headers). 
> 
> So my question is, is it even worthwhile to train spamassassin manually in
> this scenario ?

Depends on how much the "few extra headers" is.. First, any extra Received:
headers are negligible.. You don't need to be that pristine.

A few extra status headers won't hurt much, and you can use bayes_ignore_header
to have SA ignore them.

The big issues to look out for is mass-removal of headers and body formatting,
like what happens when you forward a message. Also beware of server/client
back-ends that rip out mime sections it feels are unimportant. (It's quite
common for spammers to "hide" things in the text/plain of a
multipart/alternative message).

RE: more questions on training spamassassin

Posted by Webmaster <we...@hostation.com>.

 

> -----Original Message-----
> From: Theo Van Dinter [mailto:felicity@kluge.net] 
> Sent: April 25, 2006 10:36 AM
> To: Spamassassin Users List
> Subject: Re: more questions on training spamassassin
> 
> On Tue, Apr 25, 2006 at 10:30:47AM -0700, Webmaster wrote:
> > In my setup, the server running spamassassin is different than the 
> > server delivering the final e-mail.  This means a few extra headers 
> > will be added by the time the clients see the e-mail.
> > So my question is, is it even worthwhile to train spamassassin 
> > manually in this scenario ?
> 
> If you're just talking about adding in some Received headers 
> and the usual stuff associated with delivery, that's fine.  
> If something massive happens between receiving the mail and 
> delivering it, that's a different issue.
> 

Yep, just a few received headers.
Thanks.

Re: more questions on training spamassassin

Posted by Theo Van Dinter <fe...@kluge.net>.

On Tue, Apr 25, 2006 at 10:30:47AM -0700, Webmaster wrote:
> In my setup, the server running spamassassin is different than the server
> delivering the final e-mail.  This means a few extra headers will be added
> by the time the clients see the e-mail.
> So my question is, is it even worthwhile to train spamassassin manually in
> this scenario ?

If you're just talking about adding in some Received headers and the usual
stuff associated with delivery, that's fine.  If something massive happens
between receiving the mail and delivering it, that's a different issue.

-- 
Randomly Generated Tagline:
"And, although some really nasty mind-games were played, no entities were 
 physically harmed during the making of this interactive entertainment 
 (except for the botched special-effect on the bunny rabbit that went so 
 horribly wrong and really bummed everyone out, no thanks to Mr. Boomer)."
                      - From the 7th Guest