You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Amir 'CG' Caspi <ce...@3phase.com> on 2014/02/05 19:15:39 UTC

Bayes ID depends on mailbox format?

Hi all,

Occasionally, I will receive an FN that is autolearned as ham.  Normally,
I dump it into my spam folder and sa-learn that as spam, so it should be
forgotten as ham and relearned as spam, and all is well with the world
(except for actually getting the spam, of course).

Today, I was running some manual tests, and tried to manually forget an FN
that was autolearned as ham.  In doing so, I realized that SA apparently
cares in what type of mailbox (MBOX, MBX, Maildir) the message is stored! 
Specifically, identical messages stored in MBOX and MBX folders will be
treated as different messages by sa-learn.

I realized this because I transferred the FN from my INBOX (MBOX format)
to a new SpamNotHam folder (MBX format), and tried to sa-learn --forget
this message... sa-learn reported: "Forgot tokens from 0 message(s) (1
message(s) examined)" !  Even though SA had autolearned this message, it
did not recognize it as one that it had previously learned.

At first, I thought maybe the message had gotten changed somehow between
SA categorization (done via spamc/spamd) and delivery, but there's nothing
in the chain that would do that; SA is the last link in the chain just
prior to delivery, so the output of spamc/spamd goes directly into the
INBOX (or Spam folder, when appropriate).  So, on a hunch, I transferred
the message again into a new SpamNotNam_mbox folder (MBOX format) and
re-ran sa-learn --forget... this time, "Forgot tokens from 1 message(s) (1
message(s) examined)" !

So... for whatever reason, SA doesn't see messages as identical when they
are in MBX versus MBOX format. (And yes, I was using the --mbox and --mbx
flags appropriately.)  If a message is auto-learned in MBOX format,
apparently it can only be forgotten from an MBOX mailbox.

Is this a bug, or a feature?  (I vote bug.)

This is SA 3.3.2 running on CentOS 5.10.

Thanks.

						--- Amir