You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Denenberg <ad...@sa.dberg.org> on 2004/02/10 15:31:34 UTC

still cant expire bayes tokens

i have been trying for well over a month now to expire tokens, and it
still just wont happen.  The expire thinks that i have my max tokens
going way back in time, which just isnt true.  Any advice here would be
great.

[root@nydb1 adam]# sa-learn --dump magic
0.000          0          2          0  non-token data: bayes db version
0.000          0     115103          0  non-token data: nspam
0.000          0      38288          0  non-token data: nham
0.000          0    2588992          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0 1134906269          0  non-token data: newest atime
0.000          0 1076421666          0  non-token data: last journal
sync atime
0.000          0 1076423258          0  non-token data: last expiry
atime
0.000          0    1382400          0  non-token data: last expire
atime delta
0.000          0       1019          0  non-token data: last expire
reduction count

-----
and the expire looks like this.

debug: bayes: 9510 tie-ing to DB file R/W /share/spam/bayes_toks
debug: bayes: 9510 tie-ing to DB file R/W /share/spam/bayes_seen
debug: bayes: found bayes db version 2
.................
synced Bayes databases from journal in 87 seconds: 53467 unique entries
(83965 total entries)
debug: bayes: expiry check keep size, 75% of max: 750000
debug: bayes: token count: 2588992, final goal reduction size: 1838992
debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
debug: bayes: Can't use estimation method for expiry, something fishy,
calculating optimal atime delta (first pass)
 
 debug: bayes: atime    token reduction
debug: bayes: ========  ===============
debug: bayes: 43200     2595384
debug: bayes: 86400     2595384
debug: bayes: 172800    2595384
debug: bayes: 345600    2595384
debug: bayes: 691200    2595384
debug: bayes: 1382400   2595384
debug: bayes: 2764800   2595384
debug: bayes: 5529600   2595384
debug: bayes: 11059200  2595384
debug: bayes: 22118400  2595384
debug: bayes: couldn't find a good delta atime, need more token
difference, skipping expire.
debug: Syncing complete.
debug: bayes: 9510 untie-ing
debug: bayes: 9510 untie-ing db_toks
debug: bayes: 9510 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
unlock: 9510 unlink failed: /share/spam/bayes.lock
debug: unlock: 9510 unlink /share/spam/bayes.lock



Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 12, 2004 at 11:32:05PM +0100, Kai Schaetzl wrote:
> The tokens in your DB are all
> > over 256 days old.
> This is simply "impossible" because auto-learned items are added daily and 

I should have been a little clearer...  The # of tokens listed in the
atime output are older than 256 days, as calculated from the atime of the
newest token.  If you have the same issue as the other fellow (Adam?),
then you had an erroneous message get learned with an atime in the future.

> I have "-17982" on three machines, always the same value. This db started 
> out as Bayes DB version 1 (or 0?) with SA 2.43 possibly, then was carried 
> over to two other machines and they also got upgraded to 2.5x and 2.6x 
> versions consecutively.

Well, it would have started out as DBv0 in 2.5x (2.4x had no bayes code).
If you skipped development versions of 2.60, you would have gone to DBv2
(DBv1 was a short-lived version in about 2 weeks of dev code).

> There's also no "oldest atime". Wouldn't that suggest that possibly all 
> dates are in the future?

Hmm?  Is there actually no oldest atime set, or is the value 0?  There's a
big difference.

> When I do a sa-learn --dump data, what do I need to look for? Everything 
> over 1076607731 (= last expiry atime, so near current date)?
> 
> 0.958          1          0 1051805273  low_interest
> 0.206         17         12 1075495400  HX-MIMETrack:Release

Pretty much.  I'd look for atime values (4th column) either < 100000000
or > time() (aka: perl -e 'print time(),"\n"')

Judging by the rest of this conversation, I'm going to guess you'll
find tokens with an atime of 0, and some > time(), probably by more than
256 days.


> f.i., the above, are these valid records/dates? If so, then I'm wondering 
> why it can't display an oldest atime (if I understand correctly what atime 
> means). What's the exact meaning of "atime"? Is this the time when the 
> token was added to the db? I think the times above are in the past, so it 
> should be able to show an oldest atime, shouldn't it?

"atime" is short for "access time", and is the number of seconds since
the epoch (1/1/1970) that the message was received (or sent if received
can't be determined).  In theory, the atime values should all be <=
current time(), although I allow for <= time()+86400 in case you need to
use sent time and the sender is on the other side of the planet somewhere.

The atime values are set when you learn the message or when the token
is seen in a new message -- ie: the last time the token was "accessed".

> I'm sure there is a command which converts that Unix Timestamp (assuming 
> it is one) to something human-readable, but I don't know it.

Yeah, there's a bunch.  You could probably get date to do it, but I just use:

#!/usr/bin/perl
print scalar localtime($ARGV[0]),"\n";

> Most of these records seem to be way in the future:
> 0.518        219         37 1128239545  review
> 0.978          2          0 1104581966  8:Ñ£
> 0.958          1          0 1128052147  lkalowhbrd
> 0.994          8          0 1093712392  WEST
> 0.942         90          1 1128239545  REQUIRED

Yep.

1128239545 = Sun Oct  2 03:52:25 2005 EST
1093712392 = Sat Aug 28 12:59:52 2004 EST
...

> Couldn't I simply remove these from bayes_toks or "out"? I'm not keen on 
> fixing them. It's only about 50 KB.
> So remove the token and any lines until the next token? Is that the 

You could do that, but then you'll have to edit more magic tokens to
change # of toks in DB, you'll still need to know the new newest atime,
etc.

> straight.php
>  \e0\c4\97\cf>
>  \db\d5\d4\cb\c9
>  \f0\91\05\dc>
>  
> f.i. remove that completely?

well, the format is:

  token
    value
  token
    value

etc.

> What is CVVV / CV?

It's the perl pack() format code ...  Basically C=unsigned char,
V=unsigned long (32 bits) in little endian format.

-- 
Randomly Generated Tagline:
"J: Do YOU know who the Spin Doctors are?
  P: Maybe your mother does..."            - John West and a Pizza Delivery Guy

Re: still cant expire bayes tokens

Posted by Kai Schaetzl <ma...@conactive.com>.
Theo Van Dinter wrote on Thu, 12 Feb 2004 13:11:55 -0500:

> > What does this mean? That most tokens are within the same time range
> or that 
> > most tokens are way too old ??? How can I figure this out?
> 
> Well, the data listing there tells you.

Tells you, not me ;-) I can read that stuff to a certain extent, but only  
understand portions of it.

The tokens in your DB are all
> over 256 days old.

This is simply "impossible" because auto-learned items are added daily and 
I also learn it a spam mailbox sometimes. However, it's possible that a 
great portion of the db is quite old considering the fact that it didn't 
expire for a while and we learned several thousand spam and ham mails at 
the beginning.

> 
> > 0.000          0     -17982          0  non-token data: newest atime
> 
> That's not possible.

I have "-17982" on three machines, always the same value. This db started 
out as Bayes DB version 1 (or 0?) with SA 2.43 possibly, then was carried 
over to two other machines and they also got upgraded to 2.5x and 2.6x 
versions consecutively.
There's also no "oldest atime". Wouldn't that suggest that possibly all 
dates are in the future?

When I do a sa-learn --dump data, what do I need to look for? Everything 
over 1076607731 (= last expiry atime, so near current date)?

0.958          1          0 1051805273  low_interest
0.206         17         12 1075495400  HX-MIMETrack:Release

f.i., the above, are these valid records/dates? If so, then I'm wondering 
why it can't display an oldest atime (if I understand correctly what atime 
means). What's the exact meaning of "atime"? Is this the time when the 
token was added to the db? I think the times above are in the past, so it 
should be able to show an oldest atime, shouldn't it?

I'm sure there is a command which converts that Unix Timestamp (assuming 
it is one) to something human-readable, but I don't know it.

I read the dumping instructions etc. in
Message-ID: <20...@kluge.net>
Didn't understand everything, though.
I now have a readable dump of the incorrect records (at least I hope I 
have).
Most of these records seem to be way in the future:
0.518        219         37 1128239545  review
0.978          2          0 1104581966  8:Ñ£
0.958          1          0 1128052147  lkalowhbrd
0.994          8          0 1093712392  WEST
0.942         90          1 1128239545  REQUIRED

Couldn't I simply remove these from bayes_toks or "out"? I'm not keen on 
fixing them. It's only about 50 KB.
So remove the token and any lines until the next token? Is that the 
correct thing to do? (Next thing then: learn how to convert this back to 
bayes_toks.)

straight.php
 \e0\c4\97\cf>
 \db\d5\d4\cb\c9
 \f0\91\05\dc>
 
f.i. remove that completely?
What is CVVV / CV?

Thanks,

Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 12, 2004 at 05:41:07PM +0100, Kai Schaetzl wrote:
> debug: bayes: 43200     637929
> debug: bayes: 22118400  637929
> 
> so, it would expire almost everything.

yep.

> What does this mean? That most tokens are within the same time range or that 
> most tokens are way too old ??? How can I figure this out?

Well, the data listing there tells you.  The tokens in your DB are all over 256 days old.

> 0.000          0     -17982          0  non-token data: newest atime

That's not possible.

> Is there a way I can sanitize the db? I don't really want to throw it away.

You'd have to figure out what the problem is first.  The above indicates
something is really messed up for you -- you can't have a negative
newest value.

-- 
Randomly Generated Tagline:
#define SIGILL 6         /* blech */
              -- Larry Wall in perl.c from the perl source code

Re: still cant expire bayes tokens

Posted by Kai Schaetzl <ma...@conactive.com>.
Theo Van Dinter wrote on Tue, 10 Feb 2004 11:44:24 -0500:

> FYI: For 3.0.0, I just put in some code that stops this kind of thing from
> happening (if the calculated message atime is determined to be more than
> 1 day in the future, it just uses the current time() value instead).
> If a 2.64 release happens, the fix will probably go in there too:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3025
>

I think I'm hitting the same problem:

debug: bayes: found bayes db version 2
debug: bayes: expiry check keep size, 75% of max: 112500
debug: bayes: token count: 638040, final goal reduction size: 525540
debug: bayes: First pass?  Current: 1076602270, Last: 1076601983, atime: 0, 
count: 0, newdelta: 0, ratio: 0
debug: bayes: Can't use estimation method for expiry, something fishy, 
calculating optimal atime delta (first pass)

If I understand correctly the database should have only 112500 (must be the 
2.63 default), so it's been failing for quite some time if it's now at over 
600.000.

The token reduction count stays at

debug: bayes: 43200     637929
debug: bayes: 22118400  637929

so, it would expire almost everything.
What does this mean? That most tokens are within the same time range or that 
most tokens are way too old ??? How can I figure this out?
This is a db which started around summer/autumn last year with some learning 
and is continually growing since then, with around 17.000 spam and 3.000 ham 
at the moment. I'm not sure what the next means, does it help to better 
understand the above?

0.000          0     -17982          0  non-token data: newest atime
0.000          0 1076601982          0  non-token data: last journal sync 
atime
0.000          0 1076602431          0  non-token data: last expiry atime

I "fixed" this now by setting
bayes_expiry_max_db_size 1000000

Is there a way I can sanitize the db? I don't really want to throw it away.

The interesting thing is that I have this problem on two machines but it was 
detectable only on one of them. We use a milter (MailCorral) which hands the 
mail over to spamd. The timeout for that is 60 seconds. I didn't note any 
increase in spam or other problems on that machine. Since MailCorral isn't 
actively developed anymore I'm looking for alternatives and set up 
MailScanner + SA on another machine, copied the old Bayes and other SA stuff 
over and keep sending a small portion of the spamtrap spam we get directly 
to that machine. Almost immediately I had a lot of SA time-outs and 
searching the list I finally found the articles about the "fishy" atime 
delta. MailScanner uses a smaller time-out by default, I think 20 seconds or 
so, that's still unchanged yet. So, one could imagine that the problem 
wasn't detected because the longer time-out allowed for finishing the 
hanging expiry. However, this doesn't seem to be the case. Most of the time 
the spamd result comes after a few seconds. I'm not seeing much if any spamd 
time-outs in the logs of the first machine. Is there something different 
between spamd and sa, so that the problem would exist but only visually 
emerge with SA but not with spamd? Like that spamd isn't trying the 
auto-expire with every message but just once a day while it happens with 
each invocation of spamassassin?


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 12, 2004 at 10:32:44AM -0800, Justin Mason wrote:
> (cough) wiki.SpamAssassin.org ;)

(cough)  I hope someone else posts it for me ... ;)

-- 
Randomly Generated Tagline:
And in the limiting case where the optimizer is completely broken because
 it's not implemented yet, we get to work around that too.  Optionally...
              -- Larry Wall in <20...@wall.org>

Re: still cant expire bayes tokens

Posted by Kai Schaetzl <ma...@conactive.com>.
Theo Van Dinter wrote on Thu, 12 Feb 2004 10:54:07 -0500:

> $ db_dump -p -f out .spamassassin/bayes_toks
> $ sa-learn --dump data | perl -nle 'print if ( (split)[3] > time )' > out2

I overlooked something here at first. At a quick glance it looked like this 
was a sequence, so that line 2 depends on line 1 but it isn't. I think just 
doing an
sa-learn --dump data > dump.file
is what I need. I then get everything neatly arranged in columns and just 
need to strip away all the lines with the negative value.
0.958          1          0     -17982  bgiek
Interestingly, all of them seem to be spam tokens and all have -17982.

And then rebuild the database from that. Michael sent me a script which is 
supposed to do that and the interesting thing is that it *seems* to create a 
valid db of exactly the same size as before but it's binarily different. 
(For testing purposes I dumped from a *valid* non-corrupted db and then 
recreated it with his tool. So there aren't any mistakes I could introduce 
by editing.) sa-learn identifies it as a v0 database and does not show any 
tokens or other data in it with "--dump magic". When I run --force-expire 
over it it starts converting the db to v2 and after that still lists no 
tokens and all four atime values show the current time. No errors whatsoever 
shown. Michael says his tool creates a v2 database, but sa-learn identifies 
it as v0 and converts without an error to v2. Weird.
I'm gonna post his code here once he acknowledges.

Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Feb 10, 2004 at 12:08:56PM -0500, Adam Denenberg wrote:
> thanks Theo. I would love to send my bayes_toks thru db_dump and fix the
> "broken" records.  However i am not familiar with the format. is there
> an existing script, or a site that will allow me to properly remove
> entries with bad atime values?

Not that I know of.  If you're really keen on trying this, here's the
basics...  Some of this probably should be documented somewhere besides
the code anyway ...:

# stop spamassassin ...
# make a backup copy of bayes_toks!

$ db_dump -p -f out .spamassassin/bayes_toks
$ sa-learn --dump data | perl -nle 'print if ( (split)[3] > time )' > out2

out2 now contains the list of tokens you need to fix.  go through each
one in "out" and fix it.  for instance, assume "anticipate" was a token
that needed fixing, in "out" you'd see something like:

 anticipate
 \00\fa\00\00\00\e0\00\00\00l\87*@

That's 13 bytes, which means it's the CVVV format.  If it was 5 bytes,
it's CV format, fyi.  Now you want to throw the data through unpack to
get the actual values out:

$ perl -e 'print join("\n", unpack("CVVV", "\x00\xfa\x00\x00\x00\xe0\x00\x00\x00l\x87*@"),"")'
0
250
224
1076529004

There's probably an easier way to do that, but ...  perl expect hex
values in "\x##" format, but db_dump outputs in "\##" format, so you
have to put the "x" in appropriately there.

The first 3 numbers you don't care about, but they're packing format
(0 for CVVV, or 192 for CV), # of spam matches, # of ham matches.
The fourth number is atime.  Change the atime to whatever you want, I'd
choose the current time() value (use the same one for all of the ones
you want to fix...)  In my case, I'm going to use 1076685969 just for example.
Now you get to put it back in the right format ...

$ perl -e 'print map { sprintf "\\%02x",$_ } unpack("C13", pack("CVVV", 0, 250, 224, 1076685969));print "\n"'
\00\fa\00\00\00\e0\00\00\00\91\ec\2c\40

Take that and put it in the "out" file appropriately.  Now repeat
for the other tokens.  At the end, find the newest atime magic token
"\0d\01\07\09\03NEWESTAGE", and change the value (it's just a string)
from the current one to whatever atime you used, 1076685969 in this case.

$ db_load -f out .spamassassin/bayes_toks

You can now do a "sa-learn --dump" to make sure it all looks right...:

[...]
0.000          0 1076685969          0  non-token data: newest atime
[...]
0.158        250        224 1076685969  anticipate
[...]


Now, here's the fun part -- if you have tokens in CV format (which is
very likely in your case since the ham/spam counts are very likely to be
both < 8), this whole thing becomes a lot more complicated to do by hand...
So, let's switch to the more simple, but uglier, way of doing things:

$ perl -MMail::SpamAssassin::BayesStore -e 'print join("\n", \
Mail::SpamAssassin::BayesStore::tok_unpack({db_version => 2}, \
"\x00\xfa\x00\x00\x00\xe0\x00\x00\x00l\x87*@"),"")'
250
224
1076529004

$ perl -MMail::SpamAssassin::BayesStore -e 'print map { sprintf "\\%02x",$_ } \
unpack("C*", Mail::SpamAssassin::BayesStore->tok_pack(250, 224, 1076529004));print "\n"'
\00\fa\00\00\00\e0\00\00\00\6c\87\2a\40

This code has the benefit of working for both CVVV and CV formats...

For example: "\xd0\x1fU(@"
2
0
1076385055

[...]

\d0\1f\55\28\40



Please note that by editing your DB by hand, any future issues that
arise will be blamed on the editing.  aka: no support.

-- 
Randomly Generated Tagline:
"The programmer needs the machine to run long enough to destroy it."
                      - Prof. Michaelson

Re: still cant expire bayes tokens

Posted by Adam Denenberg <ad...@sa.dberg.org>.
sorry for the repost but this seems like my only chance at a working
expiry.  Does anybody have a link to some information as to how to
properly manipulate the bayes_toks file from a db_dump to remove the
bogus atime entries so i can put it back into db_load for a fixed bayes?

thanks again.
adam

On Tue, 2004-02-10 at 12:08, Adam Denenberg wrote:
> thanks Theo. I would love to send my bayes_toks thru db_dump and fix the
> "broken" records.  However i am not familiar with the format. is there
> an existing script, or a site that will allow me to properly remove
> entries with bad atime values?
> 
> thanks
> adam
> 
> On Tue, 2004-02-10 at 11:44, Theo Van Dinter wrote:
> > On Tue, Feb 10, 2004 at 09:31:34AM -0500, Adam Denenberg wrote:
> > > debug: bayes: expiry check keep size, 75% of max: 750000
> > 
> > Ok, so your max size is 1_000_000 tokens.
> > 
> > > debug: bayes: token count: 2588992, final goal reduction size: 1838992
> > 
> > Your DB says you have ~2.6m tokens, so to get to the goal of 750k tokens,
> > you need to remove ~1.8m tokens.
> > 
> > > debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
> > > 1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
> > 
> > Not looking at the other things, the ratio is way off, so expiry isn't going to work.
> > 
> > >  debug: bayes: atime    token reduction
> > > debug: bayes: ========  ===============
> > > debug: bayes: 43200     2595384
> > > debug: bayes: 86400     2595384
> > > debug: bayes: 172800    2595384
> > > debug: bayes: 345600    2595384
> > > debug: bayes: 691200    2595384
> > > debug: bayes: 1382400   2595384
> > > debug: bayes: 2764800   2595384
> > > debug: bayes: 5529600   2595384
> > > debug: bayes: 11059200  2595384
> > > debug: bayes: 22118400  2595384
> > 
> > The interesting thing here is that you only have 2588992 tokens in the DB
> > (magic token), but the atime/reduction chart shows 2595384 being removed
> > (actual loop through DB tokens)...  What's up with that?
> > 
> > What the above chart says is that no matter what atime you use, you'll
> > be expirying too many tokens.  Now, the atime deltas here are populated
> > sets via newest_atime - token_atime.  Since your newest atime is far
> > far in the future as Matt already pointed out (1134906269 == Sun Dec
> > 18 06:44:29 2005 EST), all of your tokens are "older" than 256 days
> > (last line in the chart).
> > 
> > So ...  I would do 2 things.  1) fix the db.  unless you're _very sure_
> > about the internal db format, "rm bayes_*".  if you are used to the
> > format, do a db_dump, edit the output and modify the "future" token
> > atimes to be something more reasonable, modify the newest atime magic
> > token, do a db_load.  2) if you save your messages, find the one that
> > caused the problem and attach it to the ticket specified below...
> > 
> > FYI: For 3.0.0, I just put in some code that stops this kind of thing from
> > happening (if the calculated message atime is determined to be more than
> > 1 day in the future, it just uses the current time() value instead).
> > If a 2.64 release happens, the fix will probably go in there too:
> > http://bugzilla.spamassassin.org/show_bug.cgi?id=3025
> 


Re: still cant expire bayes tokens

Posted by Adam Denenberg <ad...@sa.dberg.org>.
thanks Theo. I would love to send my bayes_toks thru db_dump and fix the
"broken" records.  However i am not familiar with the format. is there
an existing script, or a site that will allow me to properly remove
entries with bad atime values?

thanks
adam

On Tue, 2004-02-10 at 11:44, Theo Van Dinter wrote:
> On Tue, Feb 10, 2004 at 09:31:34AM -0500, Adam Denenberg wrote:
> > debug: bayes: expiry check keep size, 75% of max: 750000
> 
> Ok, so your max size is 1_000_000 tokens.
> 
> > debug: bayes: token count: 2588992, final goal reduction size: 1838992
> 
> Your DB says you have ~2.6m tokens, so to get to the goal of 750k tokens,
> you need to remove ~1.8m tokens.
> 
> > debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
> > 1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
> 
> Not looking at the other things, the ratio is way off, so expiry isn't going to work.
> 
> >  debug: bayes: atime    token reduction
> > debug: bayes: ========  ===============
> > debug: bayes: 43200     2595384
> > debug: bayes: 86400     2595384
> > debug: bayes: 172800    2595384
> > debug: bayes: 345600    2595384
> > debug: bayes: 691200    2595384
> > debug: bayes: 1382400   2595384
> > debug: bayes: 2764800   2595384
> > debug: bayes: 5529600   2595384
> > debug: bayes: 11059200  2595384
> > debug: bayes: 22118400  2595384
> 
> The interesting thing here is that you only have 2588992 tokens in the DB
> (magic token), but the atime/reduction chart shows 2595384 being removed
> (actual loop through DB tokens)...  What's up with that?
> 
> What the above chart says is that no matter what atime you use, you'll
> be expirying too many tokens.  Now, the atime deltas here are populated
> sets via newest_atime - token_atime.  Since your newest atime is far
> far in the future as Matt already pointed out (1134906269 == Sun Dec
> 18 06:44:29 2005 EST), all of your tokens are "older" than 256 days
> (last line in the chart).
> 
> So ...  I would do 2 things.  1) fix the db.  unless you're _very sure_
> about the internal db format, "rm bayes_*".  if you are used to the
> format, do a db_dump, edit the output and modify the "future" token
> atimes to be something more reasonable, modify the newest atime magic
> token, do a db_load.  2) if you save your messages, find the one that
> caused the problem and attach it to the ticket specified below...
> 
> FYI: For 3.0.0, I just put in some code that stops this kind of thing from
> happening (if the calculated message atime is determined to be more than
> 1 day in the future, it just uses the current time() value instead).
> If a 2.64 release happens, the fix will probably go in there too:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3025


Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Feb 10, 2004 at 09:31:34AM -0500, Adam Denenberg wrote:
> debug: bayes: expiry check keep size, 75% of max: 750000

Ok, so your max size is 1_000_000 tokens.

> debug: bayes: token count: 2588992, final goal reduction size: 1838992

Your DB says you have ~2.6m tokens, so to get to the goal of 750k tokens,
you need to remove ~1.8m tokens.

> debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
> 1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653

Not looking at the other things, the ratio is way off, so expiry isn't going to work.

>  debug: bayes: atime    token reduction
> debug: bayes: ========  ===============
> debug: bayes: 43200     2595384
> debug: bayes: 86400     2595384
> debug: bayes: 172800    2595384
> debug: bayes: 345600    2595384
> debug: bayes: 691200    2595384
> debug: bayes: 1382400   2595384
> debug: bayes: 2764800   2595384
> debug: bayes: 5529600   2595384
> debug: bayes: 11059200  2595384
> debug: bayes: 22118400  2595384

The interesting thing here is that you only have 2588992 tokens in the DB
(magic token), but the atime/reduction chart shows 2595384 being removed
(actual loop through DB tokens)...  What's up with that?

What the above chart says is that no matter what atime you use, you'll
be expirying too many tokens.  Now, the atime deltas here are populated
sets via newest_atime - token_atime.  Since your newest atime is far
far in the future as Matt already pointed out (1134906269 == Sun Dec
18 06:44:29 2005 EST), all of your tokens are "older" than 256 days
(last line in the chart).

So ...  I would do 2 things.  1) fix the db.  unless you're _very sure_
about the internal db format, "rm bayes_*".  if you are used to the
format, do a db_dump, edit the output and modify the "future" token
atimes to be something more reasonable, modify the newest atime magic
token, do a db_load.  2) if you save your messages, find the one that
caused the problem and attach it to the ticket specified below...

FYI: For 3.0.0, I just put in some code that stops this kind of thing from
happening (if the calculated message atime is determined to be more than
1 day in the future, it just uses the current time() value instead).
If a 2.64 release happens, the fix will probably go in there too:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3025

-- 
Randomly Generated Tagline:
"If you think nobody cares if you're alive, try missing a couple of car
 payments." - Zen Musings

Re: still cant expire bayes tokens

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Feb 10, 2004 at 10:53:15AM -0500, Matt Kettler wrote:
> The first thing that REALLY jumps out at me, is that your newest token 
> atime is ahead of the current atime... did you have some kind of massive 
> clock messup on this system? Theoretically you shouldn't ever have 
> futuristic tokens.. SpamAssassin isn't a psychic (yet).

atimes in the bayes DB are based on the dates found in the message.
it should use the date from the top received header, but could fall
through to other headers...  fyi. :)

-- 
Randomly Generated Tagline:
"> I'm an idiot.. At least this [bug] took about 5 minutes to find..
  We need to find some new terms to describe the rest of us mere mortals
  then." - Craig Schlenter in response to Linus Torvalds about a kernel bug.

Re: still cant expire bayes tokens

Posted by Adam Denenberg <ad...@sa.dberg.org>.
no clock problems i know of. I run ntp on all my servers and just double
checked to confirm that. All my boxes are in sync to the second.

adam

On Tue, 2004-02-10 at 10:53, Matt Kettler wrote:
> At 09:31 AM 2/10/2004, Adam Denenberg wrote:
> 
> <snip>
> 
> >[root@nydb1 adam]# sa-learn --dump magic
> >0.000          0          0          0  non-token data: oldest atime
> >0.000          0 1134906269          0  non-token data: newest atime
> 
> 
> <snip>
> 
> 
> >debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
> >1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
> >debug: bayes: Can't use estimation method for expiry, something fishy,
> >calculating optimal atime delta (first pass)
> 
> The first thing that REALLY jumps out at me, is that your newest token 
> atime is ahead of the current atime... did you have some kind of massive 
> clock messup on this system? Theoretically you shouldn't ever have 
> futuristic tokens.. SpamAssassin isn't a psychic (yet).
> 
> 
> 


Re: still cant expire bayes tokens

Posted by Matt Kettler <mk...@evi-inc.com>.
At 09:31 AM 2/10/2004, Adam Denenberg wrote:

<snip>

>[root@nydb1 adam]# sa-learn --dump magic
>0.000          0          0          0  non-token data: oldest atime
>0.000          0 1134906269          0  non-token data: newest atime


<snip>


>debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
>1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
>debug: bayes: Can't use estimation method for expiry, something fishy,
>calculating optimal atime delta (first pass)

The first thing that REALLY jumps out at me, is that your newest token 
atime is ahead of the current atime... did you have some kind of massive 
clock messup on this system? Theoretically you shouldn't ever have 
futuristic tokens.. SpamAssassin isn't a psychic (yet).