You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Ivan Pantovic <iv...@yu.net> on 2007/06/28 16:35:04 UTC

sa-learn --backup and bayes tokens depend on mysql version?

Hi to all... I tried to drill this one out myself but it takes much more time
than I thought it would.

Problem is:

I have a setup with couple of mail macines running SA querying same mysql
for bayes.
Then, just recently I noticed the following... some of new installed SA's
have different bayes score then the older ones.

I did some debuging and conclusion is that although both machines are indeed
looking at the same database, they are getting different results depending
on a mysql version there is. How is this possible?

Working versions are SA 3.1.7 mysql 4.1.14 and perl 5.8.8 and the not
properly working version is mysql 4.1.22.
perl modules are the same....

Their configurations are the same, and even some results are the same ...
then i noticed this ... 

I tried to see what sa-lern look like on working machine ... and it dumps
something like this... :

v       3       db_version # this must be the first line!!!
v       3171    num_spam
v       1740    num_nonspam
t       10      0       1182854858      0000c354f7
t       0       1       1182599601      0002c9c814
t       1       0       1182847851      0003961cfc
t       1       0       1163020787      000413ba19
t       0       1       1182863710      00046b538f
t       0       1       1182864981      00048eb213
t       1       0       1182792337      00052e9096

but ... non working sa-learn who sees the same database dumps it like
this:...

v       3       db_version # this must be the first line!!!
v       3171    num_spam
v       1740    num_nonspam
t       10      0       1182854858      0000c38354c3b7
t       0       1       1182599601      0002c389c38814
t       1       0       1182847851      0003e280931cc3bc
t       1       0       1163020787      000413c2ba19
t       0       1       1182863710      00046b53c28f
t       0       1       1182864981      0004c5bdc2b213
t       1       0       1182792337      00052ec290e28093

it is obviously the same database, number of messages are the same even
atime values are there but what is this extra infomation i'm getting when
dumping bayes data from a machine bayes is not working properly?

I have to add there is some bayes tokens i get hit when checking the message
but lot less then I sould get.

working bayes debug:
[11508] dbg: bayes: tok_get_all: token count: 307
[11508] dbg: bayes: token 'H*r:mail.yu.net' => 0.999939851581825
[11508] dbg: bayes: token 'H*r:ip*194.247.192.231' => 0.99958038147139
[11508] dbg: bayes: token 'H*r:8.13.6' => 0.997298245614035
[11508] dbg: bayes: token 'Vam' => 0.995425742574258
[11508] dbg: bayes: token 'HX-Library:Indy' => 0.994923076923077
[11508] dbg: bayes: token 'HX-Library:8.0.25' => 0.994296296296296
[11508] dbg: bayes: token 'H*p:D*gmail.com' => 0.994296296296296
[11508] dbg: bayes: token 'nudimo' => 0.994296296296296
[11508] dbg: bayes: token 'H*F:D*gmail.com' => 0.994296296296296
[11508] dbg: bayes: token 'informacija' => 0.00644100651702229
[11508] dbg: bayes: token 'H*MI:smtpclu' => 0.993509555934965
[11508] dbg: bayes: token 'H*m:smtpclu' => 0.993509555934965
[11508] dbg: bayes: token 'mogucnost' => 0.993492957746479
[11508] dbg: bayes: token 'posaljite' => 0.992426229508197
[11508] dbg: bayes: token 'H*RT:mail.yu.net' => 0.991800275823836
[11508] dbg: bayes: token 'H*RT:sk:smtpclu' => 0.991800275823836
[11508] dbg: bayes: token 'srbije' => 0.0090549961351273
[11508] dbg: bayes: token 'obavestite' => 0.990941176470588
[11508] dbg: bayes: token 'preduzeca' => 0.990941176470588
[11508] dbg: bayes: token 'VAM' => 0.990941176470588
[11508] dbg: bayes: token 'DOSTAVITE' => 0.990941176470588
[11508] dbg: bayes: token 'saznajte' => 0.990941176470588
[11508] dbg: bayes: token 'Srbije' => 0.00931763810770952
[11508] dbg: bayes: token 'Vasa' => 0.988731707317073
[11508] dbg: bayes: token 'vasa' => 0.988731707317073
[11508] dbg: bayes: token 'cetiri' => 0.988731707317073
[11508] dbg: bayes: token 'narucivanje' => 0.988731707317073
[11508] dbg: bayes: token 'Vase' => 0.988731707317073
[11508] dbg: bayes: token 'kaze' => 0.014453270710345
[11508] dbg: bayes: token 'dlanu' => 0.985096774193548
[11508] dbg: bayes: token 'vasih' => 0.985096774193548
[11508] dbg: bayes: token 'proizvoda' => 0.985096774193548
[11508] dbg: bayes: token 'preduzecu' => 0.985096774193548
[11508] dbg: bayes: token 'adresar' => 0.985096774193548
[11508] dbg: bayes: token 'proizvod' => 0.985096774193548
[11508] dbg: bayes: token '990000' => 0.985096774193548
[11508] dbg: bayes: token 'postanski' => 0.985096774193548
[11508] dbg: bayes: token 'Vasih' => 0.985096774193548
[11508] dbg: bayes: token 'E-mail' => 0.985096774193548
[11508] dbg: bayes: token 'nekoliko' => 0.0156882143902964
[11508] dbg: bayes: token 'vrednost' => 0.0184329504297786
[11508] dbg: bayes: token 'H*RT:194.247.192.231' => 0.979696201682168
[11508] dbg: bayes: token 'Internetu' => 0.0215768525398059
[11508] dbg: bayes: token 'praznike' => 0.978
[11508] dbg: bayes: token 'Vasu' => 0.978
...
...
[11508] dbg: bayes: token 'novim' => 0.0438056888210111
[11508] dbg: bayes: token 'naziv' => 0.0444866337211565
[11508] dbg: bayes: token 'domena' => 0.0449915067301374
[11508] dbg: bayes: token 'koji' => 0.0452773463381254
[11508] dbg: bayes: token 'internetu' => 0.0453527611321541
[11508] dbg: bayes: token 'banke' => 0.0455685505490914
[11508] dbg: bayes: score = 0.994045530451266
[11508] dbg: bayes: DB expiry: tokens in DB: 121728, Expiry max size:
150000, Oldest atime: 1130853389, Newest atime: 1183041130, Last expire: 0,
Current time: 1183041132

not properly working sa bayes debug:
[3145] dbg: bayes: tok_get_all: token count: 308
[3145] dbg: bayes: token 'TIM' => 0.993172413793104
[3145] dbg: bayes: token 'H*RU:sk:postpai' => 0.986543689320388
[3145] dbg: bayes: token 'HX-Spam-Relays-External:sk:postpai' =>
0.986543689320388
[3145] dbg: bayes: token 'sk:wwwkon' => 0.986543689320388
[3145] dbg: bayes: token 'delatnost' => 0.0387147883969348
[3145] dbg: bayes: score = 0.781653777640385
[3145] dbg: bayes: DB expiry: tokens in DB: 121728, Expiry max size: 150000,
Oldest atime: 1130853389, Newest atime: 1183041077, Last expire: 0, Current
time: 1183041078

Any ideas?
-- 
View this message in context: http://www.nabble.com/sa-learn---backup-and-bayes-tokens-depend-on-mysql-version--tf3994597.html#a11343786
Sent from the SpamAssassin - Dev mailing list archive at Nabble.com.