You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/04/09 02:26:40 UTC
Re[2]: WHich is better
Hello Peter,
Friday, April 8, 2005, 4:24:50 AM, you wrote:
PM> Hi Robert,
PM> Thank you very much for your detailed reply. It was very helpful. I
PM> just have one question. Why can you not run sa-learn on spam already
PM> flagged as spam. ...
You can. I do.
My email system captures almost all emails that pass through it, and I
store those in "confirmed ham", "confirmed spam", "likely ham",
"likely spam", and "undetermined" buckets. **ALL** emails in the two
confirmed buckets are manually fed to sa-learn, regardless of whether
they were auto-learned.
PM> I thought spamassassin would rip out any headers it
PM> already added. If that is the case then what is the harm in re learning
PM> the spam as spam ...
You are right about that. There's no harm, and indeed, usually no
re-learning (emails already known as spam will not be re-learned as
spam -- they'll be ignored rather than processed again).
Your question wasn't about re-learning. My caution was to make sure
that everything that went into sa-learn was manually determined to be
either spam or not-spam by some human. Do not automatically sa-learn
anything -- have a human make that determination.
If you automatically sa-learn emails other than the conservative
auto-learn used by SA, you very likely /will/ garbage up your Bayes
database, causing it to mis-classify emails.
Bob Menschel