You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Juliano Simões <ju...@axios.com.br> on 2004/11/01 00:38:09 UTC

Bayes sometimes not used

Hi Guys,

I have noticed a strange behavior of SA, after upgrading from
version 2.64 to 3.0.1. Sometimes, the same message is scored
with bayes, sometimes not.

See below sample outputs from subsequent executions of
"/usr/bin/spamassassin -tLD < spam_msg_file":

************************************************************
Content analysis details: (0.7 points, 6.6 required)
 pts rule name            description
 0.4 LC_NUA               BODY: Contém NUA
 0.2 LC_FOTOS             BODY: Contém FOTOS
 0.0 HTML_80_90           BODY: Message is 80% to 90% HTML
 0.0 HTML_MESSAGE         BODY: HTML included in message
 0.0 UPPERCASE_25_50      message body is 25-50% uppercase
************************************************************
Content analysis details: (4.3 points, 6.6 required)
 pts rule name            description
 0.4 LC_NUA               BODY: Contém NUA
 0.2 LC_FOTOS             BODY: Contém FOTOS
 0.0 HTML_80_90           BODY: Message is 80% to 90% HTML
 3.6 BAYES_80             BODY: Bayesian spam probability is 80 to 95%
                          [score: 0.8361]
 0.0 HTML_MESSAGE         BODY: HTML included in message
 0.0 UPPERCASE_25_50      message body is 25-50% uppercase
************************************************************

This seems to happen regardless of message content or SA score.

ENVIRONMENT:

- running spamd with options: "-L -x -u spamd -H /home/spamd
  --socketpath=/tmp/spamd.sock --max-children=7 --daemonize"

- only use sitewide setup

- ran "sa-learn --sync" right after upgrade from 2.64 to 3.0.1

- local.cf includes:

    required_score 6.6
    report_safe 0
    use_dcc 0
    use_pyzor 0
    use_razor2 0
    skip_rbl_checks 1
    rbl_timeout 1
    score RCVD_IN_BL_SPAMCOP_NET  0
    score RCVD_IN_OSIRUSOFT_COM   0
    score X_OSIRU_DUL             0
    score X_OSIRU_DUL_FH          0
    score X_OSIRU_OPEN_RELAY      0
    score X_OSIRU_SPAM_SRC        0
    score X_OSIRU_SPAMWARE_SITE   0
    use_bayes 1
    use_bayes_rules 1
    bayes_auto_learn_threshold_nonspam -2
    bayes_auto_learn_threshold_spam 15
    bayes_auto_learn 1

Version 2.64 was very consistent when it comes to using bayes.
Any clues on what may be causing this problem on 3.0.1?

Regards,

Juliano Simões
Gerente de Tecnologia
Axios Tecnologia e Serviços
http://www.axios.com.br
juliano@axios.com.br
+55 41 324-1993 


Re: Bayes sometimes not used

Posted by Juliano Simões <ju...@axios.com.br>.
----- Original Message ----- 
From: "Theo Van Dinter" <fe...@kluge.net>
To: <us...@spamassassin.apache.org>
Sent: Sunday, October 31, 2004 9:42 PM
Subject: Re: Bayes sometimes not used

> On Sun, Oct 31, 2004 at 09:38:09PM -0200, Juliano Simões wrote:
> > I have noticed a strange behavior of SA, after upgrading from
> > version 2.64 to 3.0.1. Sometimes, the same message is scored
> > with bayes, sometimes not.
>
> Yes, Bayes can't always give an answer.  You can run with -D and see
what's
> going on.
>
> > See below sample outputs from subsequent executions of
> > "/usr/bin/spamassassin -tLD < spam_msg_file":
>
> Bayes probably learned enough tokens from the first to score on the
second.
>
> > Version 2.64 was very consistent when it comes to using bayes.
> > Any clues on what may be causing this problem on 3.0.1?
>
> Well, 2.[56]x always gave a BAYES_* hit, even if it wasn't usable (aka
> BAYES_50).  3.0 only gives you a result when there's a result.

Theo, thanks for clarifying. What still troubles me is how can SA
show a bayes hit for a given message once and skip bayes completely
for the same message after a few seconds.

I will go over the debug records to try to figure it out.

Regards,

Juliano Simões
Gerente de Tecnologia
Central Server
http://www.centralserver.com.br
juliano@centralserver.com.br
+55 41 324-1993

----- Original Message ----- 
From: "Theo Van Dinter" <fe...@kluge.net>
To: <us...@spamassassin.apache.org>
Sent: Sunday, October 31, 2004 9:42 PM
Subject: Re: Bayes sometimes not used



Re: Bayes sometimes not used

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Oct 31, 2004 at 09:38:09PM -0200, Juliano Simões wrote:
> I have noticed a strange behavior of SA, after upgrading from
> version 2.64 to 3.0.1. Sometimes, the same message is scored
> with bayes, sometimes not.

Yes, Bayes can't always give an answer.  You can run with -D and see what's
going on.

> See below sample outputs from subsequent executions of
> "/usr/bin/spamassassin -tLD < spam_msg_file":

Bayes probably learned enough tokens from the first to score on the second.

> Version 2.64 was very consistent when it comes to using bayes.
> Any clues on what may be causing this problem on 3.0.1?

Well, 2.[56]x always gave a BAYES_* hit, even if it wasn't usable (aka
BAYES_50).  3.0 only gives you a result when there's a result.

-- 
Randomly Generated Tagline:
"It is far more impressive when others discover your good qualities
 without your help." - Zen Musings

Re: Bayes sometimes not used

Posted by Juliano Simões <ju...@axios.com.br>.
----- Original Message ----- 
> From: "Matt Kettler" <mk...@evi-inc.com>
> To: "Juliano Simões" <ju...@axios.com.br>;
<us...@spamassassin.apache.org>
> Sent: Monday, November 01, 2004 12:58 PM
> Subject: Re: Bayes sometimes not used
>

> At 06:38 PM 10/31/2004, Juliano Simões wrote:
> >See below sample outputs from subsequent executions of
> >"/usr/bin/spamassassin -tLD < spam_msg_file":
>
> since you're using -D for debug output, is there anything in the debug
that
> might give some clues?
>
> Do both use the same score set?
Yes.

> Any complaints about lock failures?
No, locks and unlocks look good.

> Did either trigger autolearning?
Nope.

> Did either trigger a bayes sync (can cause a dramatic change in the bayes
> DB as the journal is integrated)?
Yes, it seems like they did. Please, take a look at the
following debug log scenarios from SA testing the same
message:

** 1. Have bayes hits **
...
debug: bayes: opportunistic call found expiry due
debug: Syncing Bayes and expiring old tokens...
debug: lock: 32684 created /home/spamd/.spamassassin/bayes.mutex
debug: lock: 32684 trying to get lock on /home/spamd/.spamassassin/bayes
with 10 timeout
debug: lock: 32684 link to /home/spamd/.spamassassin/bayes.mutex: link ok
debug: bayes: 32684 tie-ing to DB file R/W
/home/spamd/.spamassassin/bayes_toks
debug: bayes: 32684 tie-ing to DB file R/W
/home/spamd/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: refresh: 32684 refresh /home/spamd/.spamassassin/bayes.mutex
debug: Syncing complete.
debug: bayes: 32684 untie-ing
debug: bayes: 32684 untie-ing db_toks
debug: bayes: 32684 untie-ing db_seen
debug: bayes: files locked, now unlocking lock

** 2. No bayes hits **
...
debug: bayes: opportunistic call found journal sync due
debug: Syncing Bayes and expiring old tokens...
debug: lock: 4276 created /home/spamd/.spamassassin/bayes.mutex
debug: lock: 4276 trying to get lock on /home/spamd/.spamassassin/bayes with
10 timeout
debug: lock: 4276 link to /home/spamd/.spamassassin/bayes.mutex: link ok
debug: bayes: 4276 tie-ing to DB file R/W
/home/spamd/.spamassassin/bayes_toks
debug: bayes: 4276 tie-ing to DB file R/W
/home/spamd/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: refresh: 4276 refresh /home/spamd/.spamassassin/bayes.mutex
debug: Syncing complete.
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200
debug: bayes: not scoring message, returning undef
debug: bayes: 4276 untie-ing
debug: bayes: 4276 untie-ing db_toks
debug: bayes: 4276 untie-ing db_seen
debug: bayes: files locked, now unlocking lock

So, if bayes sync is the problem, why is this happen so often?
I run "sa-learn --sync" many times per day, after training ham
and spam. Is there a way to prevent spamassassin from triggering
a sync every time?

Regards,

Juliano Simões
Gerente de Tecnologia
Axios Tecnologia e Serviços
http://www.axios.com.br
juliano@axios.com.br
+55 41 324-1993


Re: Bayes sometimes not used

Posted by Matt Kettler <mk...@evi-inc.com>.
At 06:38 PM 10/31/2004, Juliano Simões wrote:
>See below sample outputs from subsequent executions of
>"/usr/bin/spamassassin -tLD < spam_msg_file":

since you're using -D for debug output, is there anything in the debug that 
might give some clues?

Do both use the same score set?
Any complaints about lock failures?
Did either trigger autolearning?
Did either trigger a bayes sync (can cause a dramatic change in the bayes 
DB as the journal is integrated)?