You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steve Dondley <s...@dondley.com> on 2004/12/12 16:59:05 UTC

Should tagged spam be fed back to server?

When training spamassassin, is it a good idea to feed spam already 
marked as spam back to SpamAssassin?  Will this help it or hinder it or 
do neither?

Re: Should tagged spam be fed back to server?

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Steve,

Sunday, December 12, 2004, 7:59:05 AM, you wrote:

SD> When training spamassassin, is it a good idea to feed spam already
SD> marked as spam back to SpamAssassin?  Will this help it or hinder it or
SD> do neither?

The simple answer:  If you can manually verify and sa-learn the emails
which pass through your system, do so. Doesn't matter whether they've
been already flagged -- feed them to Bayes.

The most important criteria is to feed Bayes correctly. that's why
manual verification is important. The fastest way to corrupt Bayes is
to feed it non-spam and call it spam, or v.v.

If email has already been correctly flagged, but not auto-learned,
then teaching it to Bayes will help Bayes correctly flag the next
email that comes in with those tokens.

If email has already been correctly flagged /and/ auto-learned, then
the only harm done is a few computer cycles, since sa-learn will
ignore emails in sa-learn with the same characterization.

Bob Menschel




Re: Should tagged spam be fed back to server?

Posted by Clarke Brunt <cl...@yeomannavigation.co.uk>.
> When training spamassassin, is it a good idea to feed spam already
> marked as spam back to SpamAssassin?  Will this help it or hinder it or
> do neither?

As ever with SpamAssassin, there are many options, and really only you can
decide how best to use them, after studying the documentation.

If you mean training with the 'sa-learn' command, then it depends on whether
the message has been automatically leaned already. Check for autolearn= in
the message headers (if it isn't there, then configure things so that it
is).
If the value is already 'spam', then it's been learned already (and will be
ignored if learned again). It the value is 'no' (or worse 'ham') then you
could usefully sa-learn it.

On the other hand, there's reporting with 'spamassassin --report...' (which
learns as for sa-learn, but also reports the spam to wherever. e.g. Razor,
Spamcop, etc.). My thoughts are that if the message was so spammy as to get
a high score, then there's probably little point (I reject these at SMTP
time, so I don't get chance to report them). But if it's only slightly
spammy, or isn't detected at all, then reporting it seems a good idea.

And if you're worried about the message already containing SpamAssassin
markup, then it's supposed to remove it automatically.

Clarke Brunt