You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/04/09 02:26:40 UTC

Re[2]: WHich is better

Hello Peter,

Friday, April 8, 2005, 4:24:50 AM, you wrote:

PM> Hi Robert,

PM> Thank you very much for your detailed reply.  It was very helpful.  I
PM> just have one question.  Why can you not run sa-learn on spam already
PM> flagged as spam.  ...

You can.  I do.

My email system captures almost all emails that pass through it, and I
store those in "confirmed ham", "confirmed spam", "likely ham",
"likely spam", and "undetermined" buckets. **ALL** emails in the two
confirmed buckets are manually fed to sa-learn, regardless of whether
they were auto-learned.

PM> I thought spamassassin would rip out any headers it
PM> already added.  If that is the case then what is the harm in re learning
PM>   the spam as spam ...

You are right about that.  There's no harm, and indeed, usually no
re-learning (emails already known as spam will not be re-learned as
spam -- they'll be ignored rather than processed again).

Your question wasn't about re-learning. My caution was to make sure
that everything that went into sa-learn was manually determined to be
either spam or not-spam by some human. Do not automatically sa-learn
anything -- have a human make that determination.

If you automatically sa-learn emails other than the conservative
auto-learn used by SA, you very likely /will/ garbage up your Bayes
database, causing it to mis-classify emails.

Bob Menschel