You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Don Levey <sp...@the-leveys.us> on 2005/04/01 18:00:52 UTC

RE: Autolearn=failed when BAYES_00 is only rule hit

Don Levey wrote:
> Please forgive me if this is in the archives; I'm having trouble
> finding it.
>
> I've just finished training my Bayes DB using sa-learn (perversely,
> when I was trying to collect 200 spam messages, the spammers decided
> to stop sending to me).  Now that the DB is usable, it's interesting
> that while most ham messages produce at least one small rule hit and
> a negative Bayes score that results in "Autolearn=no", when BAYES_00
> is the ONLY rule that hits I get "Autolearn=failed".
>
> Two quick questions:
> 1) What should I do about this, and
> 2) Should I worry, or just ignore it?
>
> TIA,
>  -Don

I may have found at least part of the problem, at least as far as the
"autolearn=no" portion of the question.  Running a message through
"spamassassin -D --mbox < msgfile" gives me the following last few lines:

debug: running body-text per-line regexp tests; score so far=8.886
debug: running uri tests; score so far=8.886
debug: running raw-body-text per-line regexp tests; score so far=8.886
debug: running full-text regexp tests; score so far=8.886
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 8.886, computed score for autolearn: 7.223
debug: auto-learn? ham=0.1, spam=12, body-points=3.1, head-points=3.64,
learned-points=-1.096
debug: auto-learn? no: inside auto-learn thresholds, not considered ham or
spam
debug: is spam? score=8.886 required=5
debug:
tests=BAYES_40,DATE_IN_FUTURE_03_06,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY
,NO_OBLIGATION,SUBJ_LIFE_INSURANCE,URIBL_OB_SURBL,URIBL_WS_SURBL
debug:
subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
__MSGID_OK_DIGITS,__MSGID_OK_HEX,__MSGID_OK_HOST,__RCVD_IN_NJABL,__RCVD_IN_S
O
RBS,__RFC_IGNORANT_ENVFROM,__SANE_MSGID


So somewhere I've got set that in order to autolearn as spam, I must have a
score of 12, and to learn as ham the score must be less than 0.1.  This
particular message scored 11.9.

The next step was to try a message that had a score greater than 12.  I saw
that on the example I chose, I also got "autolearn=failed" in the header.
Running the same debug command line, I got:

debug: running body-text per-line regexp tests; score so far=15.837
debug: running uri tests; score so far=15.837
debug: running raw-body-text per-line regexp tests; score so far=15.837
debug: running full-text regexp tests; score so far=15.837
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 15.837, computed score for autolearn:
13.387
debug: auto-learn? ham=0.1, spam=12, body-points=11.404, head-points=5.843,
learned-points=0.001
debug: auto-learn? yes, spam (13.387 > 12)
debug: Learning Spam
<debug tokenizing messages removed for brevity>
debug: bayes: 20664 untie-ing
debug: bayes: 20664 untie-ing db_toks
debug: bayes: 20664 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 20664 unlink /etc/mail/spamassassin/bayes_db.lock
debug: is spam? score=15.837 required=5
debug:
tests=BAYES_50,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY,RCVD_IN_BL_SPAMCOP_N
ET,RCVD_IN_XBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL
debug:
subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
__MSGID_OK_HOST,__RCVD_IN_SBL_XBL,__RFC_IGNORANT_ENVFROM,__SANE_MSGID

As should be clear here, it says that the message WAS autolearned.  And I
see that in the message headers generated from this run, I did get
"autolearn=spam".  I am doing this as the same user as is running spamd
(platform is Fedora, where the spamassassin "service" run is spamd).

I had been hoping to get debug messages from the above, but everything was
fine.  Checking in my maillog, however, hit a bit of paydirt:

Apr  1 09:40:01 davinci spamd[9864]: connection from davinci.example.com
[127.0.0.1] at port 41609
Apr  1 09:40:01 davinci spamd[9864]: info: setuid to root succeeded
Apr  1 09:40:01 davinci spamd[9864]: Still running as root: user not
specified with -u, not found, or set to root.  Fall back to nobody.
Apr  1 09:40:01 davinci spamd[9864]: processing message
<11...@localhost.localdomain> for root:99.
Apr  1 09:40:01 davinci spamd[9864]: bayes expire_old_tokens: lock: 9864
cannot create tmp lockfile
/etc/mail/spamassassin/bayes_db.lock.davinci.example.com.9864 for
/etc/mail/spamassassin/bayes_db.lock: Permission denied
Apr  1 09:40:01 davinci spamd[9864]: cannot write to
/etc/mail/spamassassin/bayes_db_journal, Bayes db update ignored: Permission
denied
Apr  1 09:40:07 davinci spamd[9864]: clean message (-4.9/5.0) for root:99 in
6.1 seconds, 3079 bytes.
Apr  1 09:40:07 davinci spamd[9864]: result: . -4 - BAYES_00
scantime=6.1,size=3079,mid=<11...@localhost.localdomain>,
bayes=0,autolearn=failed


Note that I am getting a permissions error creating the lock file.  This
seems to be because the permissions on the /etc/mail/spamassassin directory
do not permit the user 'spamd' to write the lock file.  I've at least
temporarily fixed this while I sort out the user ID situation, but now I'm
autolearning.

Why am I telling you all of this?  Because someone you know may be in a
similar situation, or *you* may be in a similar situation.  This at least
gets the info in the archives (perhaps again) so that it may be found.

Thanks for your time,
 -Don

Re: Autolearn=failed when BAYES_00 is only rule hit

Posted by Andy Jezierski <aj...@stepan.com>.
"Jean Caron" <ca...@norac.net> wrote on 04/01/2005 10:19:13 AM:

> You're not alone Don ! hehe... 
> 
> I was waiting for replies after your first post. Now with this one, I 
should 
> be able to make some sense of this autolearn feature within my setup. 
> Already something is clearer (clear as mud), since I'm trashing msgs 
with a 
> score greater than 10, I probably will never see autolearn=spam in my 
> headers... But I should for ham, if I understand the theory correctly. 
> 
> John 

You might not see them in your delivered messages, but SA should still be 
learning them since SA itself doesn't do any trashing. If you want to, you 
could also add the following to your SA local.cf file to adjust the 
threshold up or down.

 bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
 bayes_auto_learn_threshold_spam n.nn    (default: 12.0)


For more info: man Mail::SpamAssassin::Conf

Andy

Re: Autolearn=failed when BAYES_00 is only rule hit

Posted by Jean Caron <ca...@norac.net>.
You're not alone Don ! hehe... 

I was waiting for replies after your first post. Now with this one, I should 
be able to make some sense of this autolearn feature within my setup. 
Already something is clearer (clear as mud), since I'm trashing msgs with a 
score greater than 10, I probably will never see autolearn=spam in my 
headers... But I should for ham, if I understand the theory correctly. 

John 

Don Levey writes: 

> Don Levey wrote:
>> Please forgive me if this is in the archives; I'm having trouble
>> finding it. 
>>
>> I've just finished training my Bayes DB using sa-learn (perversely,
>> when I was trying to collect 200 spam messages, the spammers decided
>> to stop sending to me).  Now that the DB is usable, it's interesting
>> that while most ham messages produce at least one small rule hit and
>> a negative Bayes score that results in "Autolearn=no", when BAYES_00
>> is the ONLY rule that hits I get "Autolearn=failed". 
>>
>> Two quick questions:
>> 1) What should I do about this, and
>> 2) Should I worry, or just ignore it? 
>>
>> TIA,
>>  -Don
> 
> I may have found at least part of the problem, at least as far as the
> "autolearn=no" portion of the question.  Running a message through
> "spamassassin -D --mbox < msgfile" gives me the following last few lines: 
> 
> debug: running body-text per-line regexp tests; score so far=8.886
> debug: running uri tests; score so far=8.886
> debug: running raw-body-text per-line regexp tests; score so far=8.886
> debug: running full-text regexp tests; score so far=8.886
> debug: auto-learn: currently using scoreset 3, recomputing score based on
> scoreset 1.
> debug: auto-learn: message score: 8.886, computed score for autolearn: 7.223
> debug: auto-learn? ham=0.1, spam=12, body-points=3.1, head-points=3.64,
> learned-points=-1.096
> debug: auto-learn? no: inside auto-learn thresholds, not considered ham or
> spam
> debug: is spam? score=8.886 required=5
> debug:
> tests=BAYES_40,DATE_IN_FUTURE_03_06,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY
> ,NO_OBLIGATION,SUBJ_LIFE_INSURANCE,URIBL_OB_SURBL,URIBL_WS_SURBL
> debug:
> subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
> __MSGID_OK_DIGITS,__MSGID_OK_HEX,__MSGID_OK_HOST,__RCVD_IN_NJABL,__RCVD_IN_S
> O
> RBS,__RFC_IGNORANT_ENVFROM,__SANE_MSGID 
> 
> 
> So somewhere I've got set that in order to autolearn as spam, I must have a
> score of 12, and to learn as ham the score must be less than 0.1.  This
> particular message scored 11.9. 
> 
> The next step was to try a message that had a score greater than 12.  I saw
> that on the example I chose, I also got "autolearn=failed" in the header.
> Running the same debug command line, I got: 
> 
> debug: running body-text per-line regexp tests; score so far=15.837
> debug: running uri tests; score so far=15.837
> debug: running raw-body-text per-line regexp tests; score so far=15.837
> debug: running full-text regexp tests; score so far=15.837
> debug: auto-learn: currently using scoreset 3, recomputing score based on
> scoreset 1.
> debug: auto-learn: message score: 15.837, computed score for autolearn:
> 13.387
> debug: auto-learn? ham=0.1, spam=12, body-points=11.404, head-points=5.843,
> learned-points=0.001
> debug: auto-learn? yes, spam (13.387 > 12)
> debug: Learning Spam
> <debug tokenizing messages removed for brevity>
> debug: bayes: 20664 untie-ing
> debug: bayes: 20664 untie-ing db_toks
> debug: bayes: 20664 untie-ing db_seen
> debug: bayes: files locked, now unlocking lock
> debug: unlock: 20664 unlink /etc/mail/spamassassin/bayes_db.lock
> debug: is spam? score=15.837 required=5
> debug:
> tests=BAYES_50,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY,RCVD_IN_BL_SPAMCOP_N
> ET,RCVD_IN_XBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL
> debug:
> subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
> __MSGID_OK_HOST,__RCVD_IN_SBL_XBL,__RFC_IGNORANT_ENVFROM,__SANE_MSGID 
> 
> As should be clear here, it says that the message WAS autolearned.  And I
> see that in the message headers generated from this run, I did get
> "autolearn=spam".  I am doing this as the same user as is running spamd
> (platform is Fedora, where the spamassassin "service" run is spamd). 
> 
> I had been hoping to get debug messages from the above, but everything was
> fine.  Checking in my maillog, however, hit a bit of paydirt: 
> 
> Apr  1 09:40:01 davinci spamd[9864]: connection from davinci.example.com
> [127.0.0.1] at port 41609
> Apr  1 09:40:01 davinci spamd[9864]: info: setuid to root succeeded
> Apr  1 09:40:01 davinci spamd[9864]: Still running as root: user not
> specified with -u, not found, or set to root.  Fall back to nobody.
> Apr  1 09:40:01 davinci spamd[9864]: processing message
> <11...@localhost.localdomain> for root:99.
> Apr  1 09:40:01 davinci spamd[9864]: bayes expire_old_tokens: lock: 9864
> cannot create tmp lockfile
> /etc/mail/spamassassin/bayes_db.lock.davinci.example.com.9864 for
> /etc/mail/spamassassin/bayes_db.lock: Permission denied
> Apr  1 09:40:01 davinci spamd[9864]: cannot write to
> /etc/mail/spamassassin/bayes_db_journal, Bayes db update ignored: Permission
> denied
> Apr  1 09:40:07 davinci spamd[9864]: clean message (-4.9/5.0) for root:99 in
> 6.1 seconds, 3079 bytes.
> Apr  1 09:40:07 davinci spamd[9864]: result: . -4 - BAYES_00
> scantime=6.1,size=3079,mid=<11...@localhost.localdomain>,
> bayes=0,autolearn=failed 
> 
> 
> Note that I am getting a permissions error creating the lock file.  This
> seems to be because the permissions on the /etc/mail/spamassassin directory
> do not permit the user 'spamd' to write the lock file.  I've at least
> temporarily fixed this while I sort out the user ID situation, but now I'm
> autolearning. 
> 
> Why am I telling you all of this?  Because someone you know may be in a
> similar situation, or *you* may be in a similar situation.  This at least
> gets the info in the archives (perhaps again) so that it may be found. 
> 
> Thanks for your time,
>  -Don 
>