You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/08/04 22:57:02 UTC

[Bug 3331] Bayes option to keep original token as db data (not key).

http://bugzilla.spamassassin.org/show_bug.cgi?id=3331





------- Additional Comments From parkerm@pobox.com  2004-08-04 13:57 -------
An update on this bug, I've got about 90% of my proposed implementation done.

Here are the basics:

1) Add a new config param bayes_ignore_raw_token, default is 0

2) New db format version 4:
  DBM - this adds an optional A* to the tokens packed value
  SQL - this adds a raw_token column, you can omit the column if you set
        bayes_ignore_raw_token to 1

3) In theory you can switch back and forth between ignoring and not ignoring the
raw token value and everything will work.  If you ignore for a period of time,
when you stop ignoring the code will fill in any blank values as it updates a
token (NOTE will only update when the spam/ham count changes, not when it is
touched).  Starting to ignore a previously unignored raw token won't remove it
from the database, you would have to --backup/--restore to remove it totally.

4) --backup/--restore will behave correctly.  The backup format will include the
token value if you are not ignoring.  Restore will parse the raw token value if
available and insert into database if you are not ignoring.

For SQL, advance users can opt to not include the raw_token column in their
database schema.  With bayes_ignore_raw_token set to 0 the code will never refer
to that column so there should never be any sort of SQL error.  However, if the
user decides to stop ignoring then they will need to add that column back to the
schema.

Anyone game for trying to get this into 3.0.0?  It would mean that we could
possibly get away with not bumping the db version.  Also, reduce the number of
questions from folks about not seeing the actual token value via dump and what not.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.