You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by escp <ho...@googlemail.com> on 2010/09/01 11:59:01 UTC

sidewide learning need further informations

Hi,

i use spamassassin with sidewide setup.

i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
get bad results.

Problem is, that i get a lot of mails every day and cant have a look at all.
so i only want to train spamassassin with the mails my users want to get rid
off.

Thanks

p.s. sry 4 bad english :-)
-- 
View this message in context: http://old.nabble.com/sidewide-learning-need-further-informations-tp29591873p29591873.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: sidewide learning need further informations

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 9/1/2010 5:59 AM, escp wrote:
> Hi,
>
> i use spamassassin with sidewide setup.
>
> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
> get bad results.
>
> Problem is, that i get a lot of mails every day and cant have a look at all.
> so i only want to train spamassassin with the mails my users want to get rid
> off.

The bayes database (which is what you are teaching with sa-learn) works
with statistics.  It determines if a message is spam by determining if
the tokens (mostly words) in the message appear more frequently in ham
or spam.  If you only teach it spam, then it will start thinking that
everything is spam.  You need to learn both ham and spam for the bayes
engine to work right.  It doesn't have to be 50/50, but it does need a
decent amount of ham.

-- 
Bowie

Re: sidewide learning need further informations

Posted by escp <ho...@googlemail.com>.
Thx for the Informations!

i made some adjustments based on your advices.
Thank you

bb

Neil Lazarow wrote:
> 
> escp wrote:
>> Hi,
>>
>> i use spamassassin with sidewide setup.
>>
>> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
>> get bad results.
>>
>> Problem is, that i get a lot of mails every day and cant have a look at
>> all.
>> so i only want to train spamassassin with the mails my users want to get
>> rid
>> off.
>>
>> Thanks
>>
>> p.s. sry 4 bad english :-)
>>   
> Training using only spam is not a good idea.  I would suggest using a 
> sitewide
> Bayes database, and using sa-update to download the rulesets from SARE
> (saupdates.openprotect.com) in addition to the ones from 
> updates.spamassassin.org 
> They are pretty good.
> 
> If you are using an SQL database server to run Bayesian filtering, be 
> sure to use
> the bayes_sql_override_username directive in your local.cf file.  
> Otherwise your
> token table will become very large, very fast.  I am running on Solaris 
> 10 with
> PostgreSQL as my database engine.  Here is the relevant section of my 
> local.cf file.
> 
> 
> #   Use Bayesian classifier (default: 1)
> use_bayes 1
> 
> # Bayes SQL interface parameters
> 
> bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
> bayes_sql_dsn DBI:Pg:dbname=bayes_db;host=localhost;port=5432
> bayes_sql_username nobody
> bayes_sql_override_username nobody
> 
> 
> #   Bayesian classifier auto-learning (default: 1)
> #
> bayes_auto_learn 1
> bayes_auto_learn_threshold_spam 8
> bayes_auto_learn_threshold_nonspam 0.1
> bayes_min_ham_num 200
> bayes_min_spam_num 200
> 
> #   Bayes database maintenance parameters
> bayes_auto_expire 1
> bayes_expiry_max_db_size 250000
> 
> -- 
> 
> Neil Lazarow - IT Sales / Systems Engineering
> Naknan, Inc.  IT Solutions
> SDB, HMBC Member
> Minority, Woman-Owned TX HUB
> Cisco Authorized Reseller
> NEC Authorized Reseller
> Phone: 281-990-0030 ext 22
> Fax: 281-990-0033
> nlazarow@naknan.com
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/sidewide-learning-need-further-informations-tp29591873p29612159.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: sidewide learning need further informations

Posted by Neil Lazarow <nl...@naknan.com>.
escp wrote:
> Hi,
>
> i use spamassassin with sidewide setup.
>
> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
> get bad results.
>
> Problem is, that i get a lot of mails every day and cant have a look at all.
> so i only want to train spamassassin with the mails my users want to get rid
> off.
>
> Thanks
>
> p.s. sry 4 bad english :-)
>   
Training using only spam is not a good idea.  I would suggest using a 
sitewide
Bayes database, and using sa-update to download the rulesets from SARE
(saupdates.openprotect.com) in addition to the ones from 
updates.spamassassin.org 
They are pretty good.

If you are using an SQL database server to run Bayesian filtering, be 
sure to use
the bayes_sql_override_username directive in your local.cf file.  
Otherwise your
token table will become very large, very fast.  I am running on Solaris 
10 with
PostgreSQL as my database engine.  Here is the relevant section of my 
local.cf file.


#   Use Bayesian classifier (default: 1)
use_bayes 1

# Bayes SQL interface parameters

bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
bayes_sql_dsn DBI:Pg:dbname=bayes_db;host=localhost;port=5432
bayes_sql_username nobody
bayes_sql_override_username nobody


#   Bayesian classifier auto-learning (default: 1)
#
bayes_auto_learn 1
bayes_auto_learn_threshold_spam 8
bayes_auto_learn_threshold_nonspam 0.1
bayes_min_ham_num 200
bayes_min_spam_num 200

#   Bayes database maintenance parameters
bayes_auto_expire 1
bayes_expiry_max_db_size 250000

-- 

Neil Lazarow - IT Sales / Systems Engineering
Naknan, Inc.  IT Solutions
SDB, HMBC Member
Minority, Woman-Owned TX HUB
Cisco Authorized Reseller
NEC Authorized Reseller
Phone: 281-990-0030 ext 22
Fax: 281-990-0033
nlazarow@naknan.com