You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steve Rainwater <sr...@ncc.com> on 2015/05/20 18:29:26 UTC

ham source for site-wide bayes?

I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?) or has anyone come up with a way to automatically
supply ham. 

I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database? 

-Steve



Re: ham source for site-wide bayes?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 5/20/2015 12:29 PM, Steve Rainwater wrote:
> I've set up spamassassin with a site-wide bayes configuration. I have
> some spamtrap email addresses that supply fresh spam into bayes for
> training on a cron job. However, from what I've read, bayes needs to
> have ongoing ham as well as spam for training in order to work well.
> What's the usual method of supplying the ham? Does that have to be done
> manually (how often?) or has anyone come up with a way to automatically
> supply ham.
>
> I have the spamtrap email boxes that receive spam-only but all the real
> email addresses on the server receive a mix of ham and spam, which is
> why I need spamassassin in the first place :)  I can't find anything in
> spamassassin docs so far that explains a non-manual way of supplying
> ham. Have I missed something? Is there some sort of service where I can
> subscribe to an updated ham corpus automatically like with the clamav
> database?
One way people often supply ham is to use sent items from your legit users.

Regards,
KAM

Re: ham source for site-wide bayes?

Posted by Axb <ax...@gmail.com>.
On 20.05.2015 18:29, Steve Rainwater wrote:
> I've set up spamassassin with a site-wide bayes configuration. I have
> some spamtrap email addresses that supply fresh spam into bayes for
> training on a cron job. However, from what I've read, bayes needs to
> have ongoing ham as well as spam for training in order to work well.
> What's the usual method of supplying the ham? Does that have to be done
> manually (how often?)

it doesn't have to be done - you *can* do it manually.

> or has anyone come up with a way to automaticallysupply ham.

it's called auto_learn [works for me]

you'll find all the details in

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt

"LEARNING OPTIONS"

>
> I have the spamtrap email boxes that receive spam-only but all the real
> email addresses on the server receive a mix of ham and spam, which is
> why I need spamassassin in the first place :)  I can't find anything in
> spamassassin docs so far that explains a non-manual way of supplying
> ham. Have I missed something? Is there some sort of service where I can
> subscribe to an updated ham corpus automatically like with the clamav
> database?

your ham is specific to your traffic - you cannot inherit somebody 
else's ham and expect it to work nicely with you traffic.

You'll soon read a dozen of ways to do it.

I'll add mine: I use autolearn AND feed bayes trap data to a 6GB Redis 
DB [works for]

Axb