You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ma...@animalhead.com on 2008/01/16 21:23:37 UTC
Training Q
Hi SA experts,
We have procmail filters that see emails before SA. They can:
1. whitelist emails direct to our Inbox,
2. send emails to direct to the bit bucket (/dev/null)
3. send emails to the Junk folder for review, or
4. leave them for processing by SA.
So SA never sees the emails in categories 1-3.
SA can also send emails to /dev/null or send emails to the Junk
folder for review.
I've been saving up emails with which to train SA's Bayesian filters
for some time now, in 3 categories:
a. spam that was sent to the Junk folder by custom filters and SA,
b. spam that got through, and
c. ham for the same data range (last 6 months)
So, all 3 categories include emails that SA has already seen and
presumably included in its Bayesian filters, and emails that it has
never seen.
My question is, should I write a program to take out emails that SA
has already seen before I send them through Bayesian processing, or
is it smart enough not to process those again?
Best Regards,
Craig MacKenna
Re: Training Q
Posted by Loren Wilton <lw...@earthlink.net>.
> Some people advise not to relearn "old spam" what would you suggest,
> learn only last 6 month e.g.?
I'd suggest only the last 3 months or less of spam if you have enough. Old
ham should be fine though.
Loren
Re: Training Q
Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
Matthias Haegele schrieb:
> Some people advise not to relearn "old spam" what would you suggest,
> learn only last 6 month e.g.?
I meant if you must relearn "from scratch" how far you would go back?
--
Gruesse/Greetings
MH
Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--
Re: Training Q
Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
John D. Hardin schrieb:
> On Wed, 16 Jan 2008 mackenna@animalhead.com wrote:
>
>> So, all 3 categories include emails that SA has already seen and
>> presumably included in its Bayesian filters,
>
> Only if you have autolearn enabled. Can we assume that you do from
> this question? You didn't explicitly say.
>
>> and emails that it has never seen.
>>
>> My question is, should I write a program to take out emails that
>> SA has already seen before I send them through Bayesian
>> processing, or is it smart enough not to process those again?
>
> sa-learn won't re-learn messages it has already seen unless you change
> their classification (e.g. was ham, re-learn as spam). Don't worry
> about it.
>
> In addition, keeping a full corpus around helps re-learning from
> scratch should you ever need to do so.
Some people advise not to relearn "old spam" what would you suggest,
learn only last 6 month e.g.?
--
Gruesse/Greetings
MH
Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--
Re: Training Q
Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 16 Jan 2008 mackenna@animalhead.com wrote:
> So, all 3 categories include emails that SA has already seen and
> presumably included in its Bayesian filters,
Only if you have autolearn enabled. Can we assume that you do from
this question? You didn't explicitly say.
> and emails that it has never seen.
>
> My question is, should I write a program to take out emails that
> SA has already seen before I send them through Bayesian
> processing, or is it smart enough not to process those again?
sa-learn won't re-learn messages it has already seen unless you change
their classification (e.g. was ham, re-learn as spam). Don't worry
about it.
In addition, keeping a full corpus around helps re-learning from
scratch should you ever need to do so.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Phobias should not be the basis for laws.
-----------------------------------------------------------------------
Tomorrow: Benjamin Franklin's 302nd Birthday