You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ma...@animalhead.com on 2008/01/16 21:23:37 UTC

Training Q

Hi SA experts,

We have procmail filters that see emails before SA.  They can:

1. whitelist emails direct to our Inbox,
2. send emails to direct to the bit bucket (/dev/null)
3. send emails to the Junk folder for review, or
4. leave them for processing by SA.

So SA never sees the emails in categories 1-3.

SA can also send emails to /dev/null or send emails to the Junk  
folder for review.

I've been saving up emails with which to train SA's Bayesian filters  
for some time now, in 3 categories:

a. spam that was sent to the Junk folder by custom filters and SA,
b. spam that got through, and
c. ham for the same data range (last 6 months)

So, all 3 categories include emails that SA has already seen and  
presumably included in its Bayesian filters, and emails that it has  
never seen.

My question is, should I write a program to take out emails that SA  
has already seen before I send them through Bayesian processing, or  
is it smart enough not to process those again?

Best Regards,
Craig MacKenna


Re: Training Q

Posted by Loren Wilton <lw...@earthlink.net>.
> Some people advise not to relearn "old spam" what would you suggest,
> learn only last 6 month e.g.?

I'd suggest only the last 3 months or less of spam if you have enough.  Old 
ham should be fine though.

        Loren



Re: Training Q

Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
Matthias Haegele schrieb:
> Some people advise not to relearn "old spam" what would you suggest,
> learn only last 6 month e.g.?

I meant if you must relearn "from scratch" how far you would go back?


-- 
Gruesse/Greetings
MH


Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--


Re: Training Q

Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
John D. Hardin schrieb:
> On Wed, 16 Jan 2008 mackenna@animalhead.com wrote:
> 
>> So, all 3 categories include emails that SA has already seen and
>> presumably included in its Bayesian filters,
> 
> Only if you have autolearn enabled. Can we assume that you do from 
> this question? You didn't explicitly say.
> 
>> and emails that it has never seen.
>>
>> My question is, should I write a program to take out emails that
>> SA has already seen before I send them through Bayesian
>> processing, or is it smart enough not to process those again?
> 
> sa-learn won't re-learn messages it has already seen unless you change
> their classification (e.g. was ham, re-learn as spam). Don't worry
> about it.
> 
> In addition, keeping a full corpus around helps re-learning from
> scratch should you ever need to do so.

Some people advise not to relearn "old spam" what would you suggest,
learn only last 6 month e.g.?

-- 
Gruesse/Greetings
MH


Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--


Re: Training Q

Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 16 Jan 2008 mackenna@animalhead.com wrote:

> So, all 3 categories include emails that SA has already seen and
> presumably included in its Bayesian filters,

Only if you have autolearn enabled. Can we assume that you do from 
this question? You didn't explicitly say.

> and emails that it has never seen.
> 
> My question is, should I write a program to take out emails that
> SA has already seen before I send them through Bayesian
> processing, or is it smart enough not to process those again?

sa-learn won't re-learn messages it has already seen unless you change
their classification (e.g. was ham, re-learn as spam). Don't worry
about it.

In addition, keeping a full corpus around helps re-learning from
scratch should you ever need to do so.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Phobias should not be the basis for laws.
-----------------------------------------------------------------------
 Tomorrow: Benjamin Franklin's 302nd Birthday