You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adi <ad...@gmail.com> on 2014/07/24 09:32:35 UTC
Individual pre learning - Bayes in SQL
Hello
I have Bayes in SQL for each users (emails) on test server.
SA is trigger by
/usr/local/bin/spamc -U /var/run/spamd/spamd.socket -u $local_part@$domain
I looked at the results in database and have doubt.
select * from bayes_vars;
id | username | spam_count | ham_count | token_count
1 | a@x.x | 1 | 8 | 3937
13 | t@x.x | 0 | 1 | 356
15 | i@x.x | 0 | 1 | 360
Column skiped:
last_expire | last_atime_delta | last_expire_reduce |
oldest_token_age | newest_token_age |
account id 1 is oldest created few days ago.
"Trained" myself.
13 and 15 is new account received only one email:
Why both account have token_count ~ 360 ?
Not 1? whether these tokens are inherited?
sa-learn -ut@x.x --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 1 0 non-token data: nham
0.000 0 356 0 non-token data: ntokens
0.000 0 1406154984 0 non-token data: oldest atime
0.000 0 1406154984 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count
for id: 15
sa-learn -ui@x.x --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 1 0 non-token data: nham
0.000 0 360 0 non-token data: ntokens
0.000 0 1406159567 0 non-token data: oldest atime
0.000 0 1406159567 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count
Probably I should make --sync.
Second question:
whether SA draws attention to mail's header TO, CC etc.?
I want make pre learning. Collect dozens of "super" spam mails from
different accounts and by script learn all accounts in loop
sa-learn --spam --username=$account /spam/dir/*
Mail addressed to another person will not be a problem in learning
process?
Best Regards.
Re: Individual pre learning - Bayes in SQL
Posted by Adi <ad...@gmail.com>.
Hello
> OTOH if someone gets so little spam that they struggle to reach 200,
> does it matter?
I'm just in the course of transferring the mail accounts from the
server where was global bayes (with a lot ham/spam tokens) for an
individual userpref/bayes.
Before bayes reach 200 spam threshold it that a lot more time and user
get more than 200 "not super hard" spam (less than 12 score)
mails
I want some speed up the process.
Of course the user can learn Bayes by Roundcube (markasjunk2 plugin)
or my script (run by cron) after copy / move mail to an IMAP folder
(LearnOK or LearnSPAM).
Best Regards.
Re: Individual pre learning - Bayes in SQL
Posted by RW <rw...@googlemail.com>.
On Fri, 25 Jul 2014 12:21:42 +0200
Adi wrote:
> I can change To/CC in loop for trained addresses in "mega spam mails".
> Or change To/CC to example@example.com before make sa-learn.
Just delete those headers.
> I want pre learning because in the beginning people would be hard to
> get 200 SPAM trained mail to start working bayes.
OTOH if someone gets so little spam that they struggle to reach 200,
does it matter?
Re: Individual pre learning - Bayes in SQL
Posted by Adi <ad...@gmail.com>.
Hello
> A token is a word or some piece of derived data. I just means
> that email contained 360 of them.
Thanks for clarify
>> Mail addressed to another person will not be a problem in learning
>> process?
>
> Probably not. It wont make any difference in most cases, but if
> one of those addresses is in To/Cc , and the recipient hasn't yet
> trained it as ham, there's a small chance it might.
I can change To/CC in loop for trained addresses in "mega spam mails".
Or change To/CC to example@example.com before make sa-learn.
I want pre learning because in the beginning people would be hard to
get 200 SPAM trained mail to start working bayes.
I don't know it is good Idea. Normally if Bayes is working globally
I was trained it alot.
Best Regards
Re: Individual pre learning - Bayes in SQL
Posted by RW <rw...@googlemail.com>.
On Thu, 24 Jul 2014 09:32:35 +0200
Adi wrote:
> Hello
>
> 13 and 15 is new account received only one email:
>
> Why both account have token_count ~ 360 ?
> Not 1? whether these tokens are inherited?
A token is a word or some piece of derived data. I just means that that
email contained 360 of them.
> Second question:
> whether SA draws attention to mail's header TO, CC etc.?
Yes.
> I want make pre learning. Collect dozens of "super" spam mails from
> different accounts and by script learn all accounts in loop
> sa-learn --spam --username=$account /spam/dir/*
>
> Mail addressed to another person will not be a problem in learning
> process?
Probably not. It wont make any difference in most cases, but if one of
those addresses is in To/Cc , and the recipient hasn't yet trained it
as ham, there's a small chance it might.
Re: Individual pre learning - Bayes in SQL
Posted by Adi <ad...@gmail.com>.
Hello
I have Bayes in SQL for each users (emails) on test server.
SA is trigger by
/usr/local/bin/spamc -U /var/run/spamd/spamd.socket -u $local_part@$domain
My Bayes dosen't auto learn SPAM, only HAM
Some email users have 38 HAM learned but SPAM 0;/
Some settings from userpref table
| $GLOBAL | use_bayes | 1
| $GLOBAL | required_score | 6
| $GLOBAL | use_bayes | 1
| $GLOBAL | bayes_auto_learn | 1
| $GLOBAL | skip_rbl_checks | 0
| $GLOBAL | bayes_auto_learn_threshold_nonspam | 0.1
| $GLOBAL | bayes_auto_learn_threshold_spam | 12
I know that minimal threshold is 3 spam score for body +
3 score for headers.
Is strange that no one spam is not autolearn as SPAM.
few mail examples X_Spam_Status:
X-Spam-Status: Yes, score=15.2 required=6.0
tests=DCC_CHECK,DIGEST_MULTIPLE,
DKIM_SIGNED,DKIM_VALID,FUZZY_CREDIT,HEADER_FROM_DIFFERENT_DOMAINS,
RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_PSBL,
RP_MATCHES_RCVD,SPF_PASS,URIBL_DBL_SPAM,URIBL_JP_SURBL,URIBL_SC_SURBL,
URIBL_WS_SURBL autolearn=no autolearn_force=no version=3.4.0
X-Spam-Flag: YES
X-Spam-Status: Yes, score=15.4 required=6.0
tests=DCC_CHECK,DIGEST_MULTIPLE,
DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
FREEMAIL_REPLYTO_END_DIGIT,HEADER_FROM_DIFFERENT_DOMAINS,HTML_IMAGE_RATIO_02,
HTML_MESSAGE,NML_ADSP_CUSTOM_MED,RAZOR2_CF_RANGE_51_100,
RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RDNS_NONE,T_DKIM_INVALID,
URIBL_DBL_SPAM,URIBL_JP_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL
autolearn=no autolearn_force=no version=3.4.0
X-Spam-Status: Yes, score=13.1 required=6.0
tests=DCC_CHECK,DIGEST_MULTIPLE,
DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_IMAGE_RATIO_06,HTML_MESSAGE,
RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RDNS_NONE,
T_DKIM_INVALID,URIBL_DBL_SPAM,URIBL_JP_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL
autolearn=no autolearn_force=no version=3.4.0
X-Spam-Flag: YES
X-Spam-Status: Yes, score=14.6 required=6.0
tests=DCC_CHECK,DIGEST_MULTIPLE,
DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,HTML_IMAGE_RATIO_02,
HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,
RCVD_IN_PSBL,RP_MATCHES_RCVD,SPF_PASS,URIBL_DBL_SPAM,URIBL_JP_SURBL,
URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=no autolearn_force=no
version=3.4.0
Do you have any ideas?
Best Regards;