You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Maarten de Boer <md...@iua.upf.es> on 2004/12/16 17:13:50 UTC

SA does bayes every 1 of 2 mails only

Hello,

(I repost this message because I have not been able to solve
the problem, and it is becoming very noticable that more SPAM
is entering our Inboxes...)

I have been running SA succesfully for quite some time now, but
lately I am experiencing a strange problem: only every second
mail that is checked by SpamAssassin is being scored by the bayes
rules.

I started noticing this when I saw that there was no BAYES_XX score in
the SpamCheck header for some unmarked Spam. When I run
  spamassassin -D --lint
everything seems fine. So I set 
Debug = Yes
Debug SpamAssassin = Yes
in my MailScanner configuration (I run SA from MailScanner), and it
confirmed my suspicion. You can look at the output of a batch run at
http://iua-mail.upf.es/mailscanner.txt

If you look at the "debug: tests=" lines, you can see that only every second
mail is bayes-checked 

$ grep "debug: tests=" mailscanner.txt
debug: tests=ALL_TRUSTED,MISSING_HEADERS,MISSING_SUBJECT,NO_REAL_NAME
debug: tests=ALL_TRUSTED,AWL,BAYES_00
debug: tests=ALL_TRUSTED,AWL
debug: tests=ALL_TRUSTED,AWL,BAYES_00
debug: tests=ALL_TRUSTED,AWL
debug: tests=ALL_TRUSTED,AWL,BAYES_00
debug: tests=ALL_TRUSTED,AWL
debug: tests=ALL_TRUSTED,AWL,BAYES_00
debug: tests=AWL,HTML_80_90,HTML_MESSAGE,HTML_TAG_EXIST_TBODY

And what is very strange as well is that is says both:

debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200

and

debug: bayes corpus size: nspam = 6040, nham = 20334

Obviously, the "only 0 spam(s)" line is wrong. Note that it always comes
in combination with a database sync.

debug: refresh: 10537 refresh /var/lib/MailScanner/bayes.mutex
debug: synced Bayes databases from journal in 0 seconds: 74 unique entries (74
total entries)
debug: Syncing complete.
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200

I am not sure when this problem started, but I don't think it was like
this from the beginning. I updated to 3.0.1, but this did not help.

I hope somebody has an idea of what is happening... I can provide you with
more information if needed.

Kind regards,

Maarten

Re: SA does bayes every 1 of 2 mails only

Posted by Morris Jones <mo...@whiteoaks.com>.
Forgive me, I completely misread your post.  You're right, you should be 
getting bayes tests.  I wonder if it's picking up the wrong bayes 
database ...

Mojo

Morris Jones wrote:
>[ the wrong answer ]

-- 
Morris Jones
Monrovia, CA
http://www.whiteoaks.com
Old Town Astronomers: http://www.otastro.org

Re: SA does bayes every 1 of 2 mails only

Posted by Morris Jones <mo...@whiteoaks.com>.
Maarten de Boer wrote:
> And what is very strange as well is that is says both:
> 
> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200

Maarten, it appears that you trained your bayes database with a corpus 
of email and told it that it was all "ham", that it, not spam.

Assuming you have two mbox files, one full of spam, one full of ham, did 
you start out training it like so?:
	sa-learn --spam --mbox spamfile
	sa-learn --ham --mbox hamfile

Best regards,
Mojo
-- 
Morris Jones
Monrovia, CA
http://www.whiteoaks.com
Old Town Astronomers: http://www.otastro.org

Re: SA does bayes every 1 of 2 mails only

Posted by Maarten de Boer <md...@iua.upf.es>.
Hello,

I just solved the problem by doing a sa-learn --backup, deleting
the bayes_ db files, and a sa-learn --restore. I guess somehow the
db got corrupted.

maarten


> Hello,
> 
> (I repost this message because I have not been able to solve
> the problem, and it is becoming very noticable that more SPAM
> is entering our Inboxes...)
> 
> I have been running SA succesfully for quite some time now, but
> lately I am experiencing a strange problem: only every second
> mail that is checked by SpamAssassin is being scored by the bayes
> rules.
> 
> I started noticing this when I saw that there was no BAYES_XX score in
> the SpamCheck header for some unmarked Spam. When I run
>   spamassassin -D --lint
> everything seems fine. So I set 
> Debug = Yes
> Debug SpamAssassin = Yes
> in my MailScanner configuration (I run SA from MailScanner), and it
> confirmed my suspicion. You can look at the output of a batch run at
> http://iua-mail.upf.es/mailscanner.txt
> 
> If you look at the "debug: tests=" lines, you can see that only every second
> mail is bayes-checked 
> 
> $ grep "debug: tests=" mailscanner.txt
> debug: tests=ALL_TRUSTED,MISSING_HEADERS,MISSING_SUBJECT,NO_REAL_NAME
> debug: tests=ALL_TRUSTED,AWL,BAYES_00
> debug: tests=ALL_TRUSTED,AWL
> debug: tests=ALL_TRUSTED,AWL,BAYES_00
> debug: tests=ALL_TRUSTED,AWL
> debug: tests=ALL_TRUSTED,AWL,BAYES_00
> debug: tests=ALL_TRUSTED,AWL
> debug: tests=ALL_TRUSTED,AWL,BAYES_00
> debug: tests=AWL,HTML_80_90,HTML_MESSAGE,HTML_TAG_EXIST_TBODY
> 
> And what is very strange as well is that is says both:
> 
> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200
> 
> and
> 
> debug: bayes corpus size: nspam = 6040, nham = 20334
> 
> Obviously, the "only 0 spam(s)" line is wrong. Note that it always comes
> in combination with a database sync.
> 
> debug: refresh: 10537 refresh /var/lib/MailScanner/bayes.mutex
> debug: synced Bayes databases from journal in 0 seconds: 74 unique entries (74
> total entries)
> debug: Syncing complete.
> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200
> 
> I am not sure when this problem started, but I don't think it was like
> this from the beginning. I updated to 3.0.1, but this did not help.
> 
> I hope somebody has an idea of what is happening... I can provide you with
> more information if needed.
> 
> Kind regards,
> 
> Maarten
> 
>