You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by escp <ho...@googlemail.com> on 2010/09/01 11:59:01 UTC
sidewide learning need further informations
Hi,
i use spamassassin with sidewide setup.
i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
get bad results.
Problem is, that i get a lot of mails every day and cant have a look at all.
so i only want to train spamassassin with the mails my users want to get rid
off.
Thanks
p.s. sry 4 bad english :-)
--
View this message in context: http://old.nabble.com/sidewide-learning-need-further-informations-tp29591873p29591873.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: sidewide learning need further informations
Posted by Bowie Bailey <Bo...@BUC.com>.
On 9/1/2010 5:59 AM, escp wrote:
> Hi,
>
> i use spamassassin with sidewide setup.
>
> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
> get bad results.
>
> Problem is, that i get a lot of mails every day and cant have a look at all.
> so i only want to train spamassassin with the mails my users want to get rid
> off.
The bayes database (which is what you are teaching with sa-learn) works
with statistics. It determines if a message is spam by determining if
the tokens (mostly words) in the message appear more frequently in ham
or spam. If you only teach it spam, then it will start thinking that
everything is spam. You need to learn both ham and spam for the bayes
engine to work right. It doesn't have to be 50/50, but it does need a
decent amount of ham.
--
Bowie
Re: sidewide learning need further informations
Posted by escp <ho...@googlemail.com>.
Thx for the Informations!
i made some adjustments based on your advices.
Thank you
bb
Neil Lazarow wrote:
>
> escp wrote:
>> Hi,
>>
>> i use spamassassin with sidewide setup.
>>
>> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
>> get bad results.
>>
>> Problem is, that i get a lot of mails every day and cant have a look at
>> all.
>> so i only want to train spamassassin with the mails my users want to get
>> rid
>> off.
>>
>> Thanks
>>
>> p.s. sry 4 bad english :-)
>>
> Training using only spam is not a good idea. I would suggest using a
> sitewide
> Bayes database, and using sa-update to download the rulesets from SARE
> (saupdates.openprotect.com) in addition to the ones from
> updates.spamassassin.org
> They are pretty good.
>
> If you are using an SQL database server to run Bayesian filtering, be
> sure to use
> the bayes_sql_override_username directive in your local.cf file.
> Otherwise your
> token table will become very large, very fast. I am running on Solaris
> 10 with
> PostgreSQL as my database engine. Here is the relevant section of my
> local.cf file.
>
>
> # Use Bayesian classifier (default: 1)
> use_bayes 1
>
> # Bayes SQL interface parameters
>
> bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
> bayes_sql_dsn DBI:Pg:dbname=bayes_db;host=localhost;port=5432
> bayes_sql_username nobody
> bayes_sql_override_username nobody
>
>
> # Bayesian classifier auto-learning (default: 1)
> #
> bayes_auto_learn 1
> bayes_auto_learn_threshold_spam 8
> bayes_auto_learn_threshold_nonspam 0.1
> bayes_min_ham_num 200
> bayes_min_spam_num 200
>
> # Bayes database maintenance parameters
> bayes_auto_expire 1
> bayes_expiry_max_db_size 250000
>
> --
>
> Neil Lazarow - IT Sales / Systems Engineering
> Naknan, Inc. IT Solutions
> SDB, HMBC Member
> Minority, Woman-Owned TX HUB
> Cisco Authorized Reseller
> NEC Authorized Reseller
> Phone: 281-990-0030 ext 22
> Fax: 281-990-0033
> nlazarow@naknan.com
>
>
>
--
View this message in context: http://old.nabble.com/sidewide-learning-need-further-informations-tp29591873p29612159.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: sidewide learning need further informations
Posted by Neil Lazarow <nl...@naknan.com>.
escp wrote:
> Hi,
>
> i use spamassassin with sidewide setup.
>
> i want to know, if i can ONLY learn spam w/o learning ham. I dont want to
> get bad results.
>
> Problem is, that i get a lot of mails every day and cant have a look at all.
> so i only want to train spamassassin with the mails my users want to get rid
> off.
>
> Thanks
>
> p.s. sry 4 bad english :-)
>
Training using only spam is not a good idea. I would suggest using a
sitewide
Bayes database, and using sa-update to download the rulesets from SARE
(saupdates.openprotect.com) in addition to the ones from
updates.spamassassin.org
They are pretty good.
If you are using an SQL database server to run Bayesian filtering, be
sure to use
the bayes_sql_override_username directive in your local.cf file.
Otherwise your
token table will become very large, very fast. I am running on Solaris
10 with
PostgreSQL as my database engine. Here is the relevant section of my
local.cf file.
# Use Bayesian classifier (default: 1)
use_bayes 1
# Bayes SQL interface parameters
bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
bayes_sql_dsn DBI:Pg:dbname=bayes_db;host=localhost;port=5432
bayes_sql_username nobody
bayes_sql_override_username nobody
# Bayesian classifier auto-learning (default: 1)
#
bayes_auto_learn 1
bayes_auto_learn_threshold_spam 8
bayes_auto_learn_threshold_nonspam 0.1
bayes_min_ham_num 200
bayes_min_spam_num 200
# Bayes database maintenance parameters
bayes_auto_expire 1
bayes_expiry_max_db_size 250000
--
Neil Lazarow - IT Sales / Systems Engineering
Naknan, Inc. IT Solutions
SDB, HMBC Member
Minority, Woman-Owned TX HUB
Cisco Authorized Reseller
NEC Authorized Reseller
Phone: 281-990-0030 ext 22
Fax: 281-990-0033
nlazarow@naknan.com