You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Brian Eliassen <br...@eliassen.org> on 2014/04/27 04:41:39 UTC

BAYES_00 Query

Hello Keepers of SpamAssassin Knowledge,

I've been lurking on this list for years and never had a question pop  
up until today.  About a week ago I said, "enough is enough" regarding  
the amount of spam I've been receiving so I've been doing some  
upgrades.  As such, I recently upgraded to SA 3.4 and did the  
recommended "sa-learn --clear" to clean out the database.  I had a  
huge pile of recent spam and ham so I repopulated the database with  
those.  Afterwards, here is what my "sa-learn --dump magic" looked like:

0.000          0          3          0  non-token data: bayes db version
0.000          0      35575          0  non-token data: nspam
0.000          0       1870          0  non-token data: nham
0.000          0     180984          0  non-token data: ntokens
0.000          0 1314919780          0  non-token data: oldest atime
0.000          0 1398209850          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal  
sync atime
0.000          0 1398228671          0  non-token data: last expiry  
atime
0.000          0     691200          0  non-token data: last expire  
atime delta
0.000          0    2166321          0  non-token data: last expire  
reduction count

Yes, I had that much spam stored up.  That sa-learn took several  
hours.  But on to my question; I have been extra careful to note what  
has been slipping by the filter and here is what I've seen over the  
past two days:

3.299 (***) BAYES_00,FORGED_RELAY_MUA_TO_MX
3.92 (***)  
BAYES_00 
,FREEMAIL_FROM 
,RDNS_NONE,TBIRD_SUSP_MIME_BDRY,T_HTML_ATTACH,T_OBFU_HTML_ATTACH
-1 () BAYES_00
0.279 () BAD_CREDIT,BAYES_00
-0.988 () BAYES_00,HTML_EXTRA_CLOSE,HTML_MESSAGE,T_REMOTE_IMAGE
3.299 (***) BAYES_00,FORGED_RELAY_MUA_TO_MX
-0.988 () BAYES_00,HTML_EXTRA_CLOSE,HTML_MESSAGE,T_REMOTE_IMAGE
-0.979 () BAYES_00,FREEMAIL_FROM,T_HTML_ATTACH,T_OBFU_HTML_ATTACH
0.436 ()  
BAYES_00,DIET_1,HELO_MISC_IP,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE
0.436 ()  
BAYES_00,DIET_1,HELO_MISC_IP,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE

The thing that is common is BAYES_00 on all of these.  It's the  
standard -1 score.  Did I do something horrible with my installation  
to allow this sort of crud to slip through?  Isn't that when Bayes  
things that the mail isn't spam?  Look at some of the other rules that  
are hitting.  I cannot figure out why BAYES_00 would hit on these.

Thanks in advance.

Oh, this is a sendmail -> mimedefang -> spamassassin/clamav/razor  
installation.  Any recommendations on additional plugins to consider  
and/or SARE-like channels to subscribe to would be greatly appreciated.

Brian


Re: BAYES_00 Query

Posted by John Hardin <jh...@impsec.org>.
On Sun, 27 Apr 2014, Axb wrote:

> On 04/27/2014 06:02 PM, John Hardin wrote:
>>  Then wipe and retrain again.
>
> I'd definitely go for that
>
> oldest spam in bayes is from Thu, 01 Sep 2011 23:29:40 GMT
> 0.000          0 1314919780          0  non-token data: oldest atime
>
> The DB just hasn't enough spam to make a  difference.

Ah, I didn't notice that bit. Let me amend my advice some:

Disable automatic bayes expiry too. Don't run a manual bayes expiration 
until and unless you decide to enable autolearn, and don't run an 
expiration until autolearn has collected sufficient *recent* messages.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   I would buy a Mac today if I was not working at Microsoft.
                           -- James Allchin, Microsoft VP of Platforms
-----------------------------------------------------------------------
  696 days since the first successful private support mission to ISS (SpaceX)

Re: BAYES_00 Query

Posted by Axb <ax...@gmail.com>.
On 04/27/2014 06:02 PM, John Hardin wrote:
> Then wipe and retrain again.

I'd definitely go for that

oldest spam in bayes is from Thu, 01 Sep 2011 23:29:40 GMT
0.000          0 1314919780          0  non-token data: oldest atime

The DB just hasn't enough spam to make a  difference.



Re: BAYES_00 Query

Posted by John Hardin <jh...@impsec.org>.
On Sat, 26 Apr 2014, Brian Eliassen wrote:

> Yes, I had that much spam stored up.

Good.

> I cannot figure out why BAYES_00 would hit on these.

First, do you have autolearn enabled? If so, I would turn it off until the 
basic initial Bayes training is proven.

Second, if spams are hitting BAYES_00 that means they "look hammy" based 
on how Bayes has been trained. Take a look, manually, at *every* message 
in your ham corpus and verify that it indeed is purely ham.

You can also add some of the recent misclassified spams to your spam 
training corpus.

Then wipe and retrain again.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...to announce there must be no criticism of the President or to
   stand by the President right or wrong is not only unpatriotic and
   servile, but is morally treasonous to the American public.
                                           -- Theodore Roosevelt, 1918
-----------------------------------------------------------------------
  696 days since the first successful private support mission to ISS (SpaceX)

Re: BAYES_00 Query

Posted by Benny Pedersen <me...@junc.eu>.
Check bayes settings, did you train as same user as mimedefang runs as if not using sql bayes backend, is your setup global bayes or pr user setup?
-- 
Sendt fra min Android telefon med K-9 Mail. Undskyld hvis jeg er lidt kortfattet.