You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "C. Bensend" <be...@bennyvision.com> on 2006/03/24 15:26:33 UTC

sa-learn --backup and --restore issue: duplicate key violations

Hey folks,

   I'm going to be upgrading my mailserver in a month or two,
so I'm running through some different configurations for
SpamAssassin, IMAP, and anti-virus.  I'm working on testing the
SQL stuff for user configs and Bayes right now.

   So, here are the stats:


Old mailserver                  New mailserver
=============================   ============================
OpenBSD 3.6 on AMD64            OpenBSD 3.9 on AMD64
SpamAssassin 3.0.4 using files  SpamAssassin 3.1.1 using SQL


   To begin testing, I did a 'sa-learn --backup > outfile' on
the existing mailserver, and 'sa-learn --restore outfile' on
a POS test box I have installed with a recent snapshot of
OpenBSD 3.9.  I used the native SpamAssassin's version of
sa-learn (ie, I used 3.0.4's sa-learn on the old box, and
3.1.1's version of sa-learn on the new).

   The dump is significant - over a half a million lines and
around 28MB.  I should mention that I believe I have
SpamAssassin properly configured to talk to the database on
the POS testing box, everything seems fine there.

   The restore starts fine, and runs and runs.  I see the
tokens being stuffed into the bytea columns, and finally when
it comes to the bayes_seen stuff, I see the INSERTs flying
past.  Yay!

   But after a while (I know it got over 270,000 rows INSERTed,
but I don't know how many more after that), it starts throwing
unique contraint violations:

[21458] dbg: bayes: error inserting msgid in seen table for line:
s_h_2f1e7a2bb5590e61c30502a11a32dc071b85a685@sa_generated
[21458] dbg: bayes: seen_put: SQL error: ERROR:  duplicate key violates
unique constraint "bayes_seen_pkey"
[21458] dbg: bayes: error inserting msgid in seen table for line:
s_h_4de1a35a361628a269b4dc65076b9a80f6ac8383@sa_generated
[21458] dbg: bayes: seen_put: SQL error: ERROR:  duplicate key violates
unique constraint "bayes_seen_pkey"

   After a number of these, it dies with:

bayes: encountered too many errors (20) while parsing seen lines,
reverting to empty database and exiting

ERROR: Bayes restore returned an error, please re-run with -D for more
information

   .. which makes me sad.  So, my question - is there a way to
fix this?  Or will I have to end up dumping my Bayes and starting
over?  I really hope I don't have to do that, because my Bayes
database is huge and really quite accurate.

Thanks, folks!

Benny


-- 
"A computer lets you make more mistakes faster than any invention
in human history, with the possible exceptions of handguns and
tequila."                                          -- Found on usenet


Re: sa-learn --backup and --restore issue: duplicate key violations

Posted by "C. Bensend" <be...@bennyvision.com>.
>    After a number of these, it dies with:
>
> bayes: encountered too many errors (20) while parsing seen lines,
> reverting to empty database and exiting
>
> ERROR: Bayes restore returned an error, please re-run with -D for more
> information
>
>    .. which makes me sad.  So, my question - is there a way to
> fix this?  Or will I have to end up dumping my Bayes and starting
> over?  I really hope I don't have to do that, because my Bayes
> database is huge and really quite accurate.

   I hadn't seen any responses to my question as of yet, so I
decided to do some more experimenting.

   I ran the backup file through sort and uniq, moved the version
line back to the top, and ran it through sa-learn again.  This
time, it completed successfully:

[17678] dbg: bayes: parsed 522507 lines
[17678] dbg: bayes: created database with 117864 tokens based on 249654
spam messages and 155005 ham messages

   So, is this an OK thing to have done?  Due to the lack of
a single error, I'm guessing that changing the order of the
backup file (other than the version line) doesn't hurt anything.
Is this correct?

   Also, any ideas how my Bayes database got duplicate tokens
in the first place?

Thanks,

Benny


-- 
"A computer lets you make more mistakes faster than any invention
in human history, with the possible exceptions of handguns and
tequila."                                          -- Found on usenet