You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Jeremy M. Dolan" <jm...@pobox.com> on 2004/09/25 19:59:57 UTC

Bayes DB seemingly corrupted during v2 to v3 upgrade

Hi all. Hoping someone might be able to help me out here. Just
upgraded from 2.6x to 3.0.0 this morning, and, though I followed the
Bayes DB upgrade steps in the UPGRADE file to a T, my token names all
seem to be garbage now.

Here's a few lines of the output from "sa-learn --dump all":

0.560         21          3 1094789733  dc60473720
0.992          6          0 1090849205  20d2b3d689
0.958          1          0 1092129562  23c375c031
0.998         20          0 1095699812  cc75bc02df

That fifth field, which I remember as being the token name in 2.6x, is
ostensibly junk data now. I could be wrong, maybe it's supposed to now
look like that, but as the documentation says to check --dump output
and "make sure the data looks valid", that seems unlikely.

What happened? I don't see any similar reports in the list archives.
Where could I have gone wrong? I have backups of bayes_(toks|seen) if
you can suggest anything to try. I did run the --sync in 3.0.0 with
the -D flag though, and aside from the only slightly suspicious

  debug: refresh: 22434 refresh /home/jmd/.spamassassin/bayes.lock

being printed about 150 times, everything seemed like your typical
debug mode output.

As I get a few hundred spams a day, I'm terrified of starting up
fetchmail again without SpamAssassin back and fully operational.
Help! :)

/jmd

PS: Great job on 3.0 folks--it looks great on paper/the web site, at
least. Hoping it will cut in to the 2-3% of spam that was slipping by
2.6x. The new tests look promising.

-- 
Jeremy M. Dolan <ma...@pobox.com> <http://jmd.us/>
PGP: 1024D/3C68A1BA 9470 210C A476 FFBB 6D11  0223 0D1C ABFC 3C68 A1BA

Re: Bayes DB seemingly corrupted during v2 to v3 upgrade

Posted by Michael Parker <pa...@pobox.com>.
On Sat, Sep 25, 2004 at 12:59:57PM -0500, Jeremy M. Dolan wrote:
> Hi all. Hoping someone might be able to help me out here. Just
> upgraded from 2.6x to 3.0.0 this morning, and, though I followed the
> Bayes DB upgrade steps in the UPGRADE file to a T, my token names all
> seem to be garbage now.
> 
> Here's a few lines of the output from "sa-learn --dump all":
> 
> 0.560         21          3 1094789733  dc60473720
> 0.992          6          0 1090849205  20d2b3d689
> 0.958          1          0 1092129562  23c375c031
> 0.998         20          0 1095699812  cc75bc02df
> 

We no longer store the raw token value in the database, instead it is
a hashed value.  There is a small blurb about this in UPGRADE.

The values in the dump are actually hex representations of the binary
values stored in the database.

So, relax, you database is fine.

Michael