You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Jones <dj...@ena.com> on 2018/01/02 00:18:15 UTC

Question about BAYES_999

I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed to 
complement the BAYES_99 rule and both would trigger when BAYES_999 hit.

https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99.

-- 
David Jones


Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/04/2018 11:20 AM, RW wrote:
> On Thu, 4 Jan 2018 10:40:49 -0600
> David Jones wrote:
> 
>> On 01/04/2018 10:04 AM, RW wrote:
> 
>>> Are you sure that's right? It's a radically different frequency from
>>> 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
>>> think they are and not something like
>>>    
>>
>> My production MailScanner instance has a highly tuned MTA in front of
>> it so SA doesn't see as much spam.  The amavis instance is
>> intentionally open to more spam to collect for the nightly masscheck
>> processing.
> 
> That's not obviously relevant since I was referring to the
> frequency of mails missing BAYES_99 within emails hitting BAYES_999.
> 
> 
>>> ... rules: meta test FOO has dependency 'BAYES_999' with a zero
>>> score
>>
>> If I had BAYES_99 set to a zero score, it would never show up in my
>> logs.
> 
>   
> As I said, they are bogus warnings. I think it's a known issue.
> 
> I have BAYES_999 scored and  I get 3 such matches per spamd restart
> using your grep patterns. Your 4 seem highly suspicious.
> 
> Until you manually check those 4, or retry with better grep patterns,
> you don't really know what's happening.
> 

I understand what you are saying now.  I will track down those 4 and 
report back.

-- 
David Jones

Re: Question about BAYES_999

Posted by RW <rw...@googlemail.com>.
On Thu, 4 Jan 2018 10:40:49 -0600
David Jones wrote:

> On 01/04/2018 10:04 AM, RW wrote:

> > Are you sure that's right? It's a radically different frequency from
> > 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
> > think they are and not something like
> >   
> 
> My production MailScanner instance has a highly tuned MTA in front of
> it so SA doesn't see as much spam.  The amavis instance is
> intentionally open to more spam to collect for the nightly masscheck
> processing.

That's not obviously relevant since I was referring to the
frequency of mails missing BAYES_99 within emails hitting BAYES_999.


> > ... rules: meta test FOO has dependency 'BAYES_999' with a zero
> > score 
> 
> If I had BAYES_99 set to a zero score, it would never show up in my
> logs.

 
As I said, they are bogus warnings. I think it's a known issue.

I have BAYES_999 scored and  I get 3 such matches per spamd restart
using your grep patterns. Your 4 seem highly suspicious.

Until you manually check those 4, or retry with better grep patterns,
you don't really know what's happening.






Re: Question about BAYES_999

Posted by Alex <my...@gmail.com>.
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
>>     6508  247134 16925929
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
>> BAYES_999|grep BAYES_99\"|wc
>>     6508  247134 16925929
>>
>
> You need that last grep for BAYES_99 to be a "grep -v" and it needs some
> delimiter after the "99" to disinguish it from "999" like an equals sign
> since that is how amavis outputs it's rule hits and score.
>
> Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM
> {RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14]
> <ze...@zeilcar.net> -> <po...@sa.ena.net>, Queue-ID: C193E4A5F78C,
> Message-ID: <9D...@zeilcar.net>, mail_id:
> pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 ms,
> Tests: [BAYES_999=0.2,...
>
> Note the "BAYES_999=0.2" above would make your grep look like this:
>
> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep -v
> BAYES_99=|wc

Ugh, yes, sorry. This was the result of pasting the wrong line while
experimenting. My separator is a quote. This is actually more precise
now, as the logging separates the rules into tests, tests_ham and
tests_spam:

# cat /var/log/maillog|grep timestamp|grep BAYES_99|perl -p -e
's|.*tests\":\[(.*)\],\"tests_ham.*|$1|'|grep BAYES_999\"|grep -v
BAYES_99\"

results with nothing printed.

...
ing in your area","subject_rot13":"EBPXL, n ybfg QBT, vf zvffvat va
lbhe nern","tests":["BAYES_99","BAYES_999","DCC_CHECK","DKIM_SIGNED","DKIM_VALID","DKIM_VALID_AU","HTML_IMAGE_RATIO_04","HTML_MESSAGE","MIME_HTML_ONLY","RCVD_IN_DNSWL_NONE","RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","RELAYCOUNTRY_US","SPF_HELO_PASS","SPF_PASS","TXREP","T_DMARC_TESTS_PASS","T_REMOTE_IMAGE","T_RP_MATCHES_RCVD"],"tests_ham":["RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","DKIM_VALID_AU","DKIM_VALID","T_RP_MATCHES_RCVD","TXREP","SPF_HELO_PASS","SPF_PASS","RCVD_IN_DNSWL_NONE"],"tests_spam":["BAYES_99","MIME_HTML_ONLY","HTML_IMAGE_RATIO_04","DCC_CHECK","BAYES_999","DKIM_SIGNED","T_DMARC_TESTS_PASS","RELAYCOUNTRY_US","T_REMOTE_IMAGE","HTML_MESSAGE"],"time_iso_week_date":"2018-W01-4","time_unix":1515042197.974,"to_addr":["MORENO@example.com"],"type":"amavis"}

Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/04/2018 10:58 AM, Alex wrote:
> Hi,
> 
>>>> I am seeing this problem on my MailScanner filters as well:
>>>>
>>>> # grep BAYES_999 maillog-20171231 | wc -l
>>>> 9172
>>>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
>>>> 4
> 
> # rpm -q amavisd-new
> amavisd-new-2.11.0-4.fc25.noarch
> # rpm -q perl
> perl-5.24.3-389.fc25.x86_64
> 
> This is with the JSON logging enabled so my grep is a bit different.
> This is also with an SVN spamassassin snapshot from about two weeks
> ago. This is also with bayes stored in mysql.
> 
> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
>     6508  247134 16925929
> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
> BAYES_999|grep BAYES_99\"|wc
>     6508  247134 16925929
> 

You need that last grep for BAYES_99 to be a "grep -v" and it needs some 
delimiter after the "99" to disinguish it from "999" like an equals sign 
since that is how amavis outputs it's rule hits and score.

Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM 
{RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14] 
<ze...@zeilcar.net> -> <po...@sa.ena.net>, Queue-ID: 
C193E4A5F78C, Message-ID: 
<9D...@zeilcar.net>, mail_id: 
pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 
ms, Tests: [BAYES_999=0.2,...

Note the "BAYES_999=0.2" above would make your grep look like this:

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep 
-v BAYES_99=|wc

> Please let me know if there's anything further I can do to help.
> 

-- 
David Jones

Re: Question about BAYES_999

Posted by Alex <my...@gmail.com>.
Hi,

>>> I am seeing this problem on my MailScanner filters as well:
>>>
>>> # grep BAYES_999 maillog-20171231 | wc -l
>>> 9172
>>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
>>> 4

# rpm -q amavisd-new
amavisd-new-2.11.0-4.fc25.noarch
# rpm -q perl
perl-5.24.3-389.fc25.x86_64

This is with the JSON logging enabled so my grep is a bit different.
This is also with an SVN spamassassin snapshot from about two weeks
ago. This is also with bayes stored in mysql.

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
   6508  247134 16925929
# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
BAYES_999|grep BAYES_99\"|wc
   6508  247134 16925929

Please let me know if there's anything further I can do to help.

Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/04/2018 10:04 AM, RW wrote:
> On Thu, 4 Jan 2018 08:02:48 -0600
> David Jones wrote:
> 
>> On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:
>>>> On 2 Jan 2018, at 07:17, David Jones djones@ena.com> wrote:
>>>>> I haven't redefined these rules from what I can tell by searching
>>>>> my local rules.  I would think that if I had done this, then
>>>>> there would be consistent non-hits of BAYES_99 with BAYES_999 all
>>>>> of the time.  This is only happening a small percentage of the
>>>>> time.
>>>
>>> On 02.01.18 15:39, @lbutlr wrote:
>>>> Checking my mail I see an incidence rate of this of about 0.5%,
>>>> which matches the rate you posted earlier.
>>>
>>> amavis?
>>>    
>>
>> I am seeing this problem on my MailScanner filters as well:
>>
>> # grep BAYES_999 maillog-20171231 | wc -l
>> 9172
>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
>> 4
> 
> Are you sure that's right? It's a radically different frequency from
> 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think
> they are and not something like
> 

My production MailScanner instance has a highly tuned MTA in front of it 
so SA doesn't see as much spam.  The amavis instance is intentionally 
open to more spam to collect for the nightly masscheck processing.

> ... rules: meta test FOO has dependency 'BAYES_999' with a zero score
> 

If I had BAYES_99 set to a zero score, it would never show up in my logs.

> I get some bogus warnings like this.
> 
> You need something to make sure it's a result line and some  boundary
> checks like \bBAYES_99\b might help too.
> 

MailScanner log output looks like this:

Dec 31 07:30:46 smtp2i.ena.net MailScanner[26137]: Message 
8902A148068E.ACC23 from 106.10.241.143 (novak5450@att.net) to k12tn.net 
is spam, SpamAssassin (not cached, score=35.679, required 4, 
autolearn=spam, BAYES_99 5.20, BAYES_999 0.20, DCC_CHECK 2.20, 
DIGEST_MULTIPLE 0.29, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU 
-0.10, DMARC_NONE 0.01, ...

Pretty sure my grep'ing above is good.

> 
> I can't reproduce the problem using spamc/spamd  from 3.4.1 with perl
> 5.24.3 on FreeBSD 11.1 with  Berkeley DB.
> 
> I don't have any missing BAYES_99 hits in my corpus headers and to
> check it's not a recent bug I rescanned ~5k spams and checked the logs.
> 

I appreciate you checking to help us determine how widespread this issue 
is and possibly narrow it down.

Here is my SA setup that is common to both my amavis and MailScanner:

SA 3.4.1
Shared Bayes DB in Redis
Share the same local configs for custom rules

Since we have at least one other person on this list reporting the same 
problem, I don't think my local custom rules are the problem.

-- 
David Jones

Re: Question about BAYES_999

Posted by RW <rw...@googlemail.com>.
On Thu, 4 Jan 2018 08:02:48 -0600
David Jones wrote:

> On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:
> >> On 2 Jan 2018, at 07:17, David Jones djones@ena.com> wrote:  
> >>> I haven't redefined these rules from what I can tell by searching
> >>> my local rules.  I would think that if I had done this, then
> >>> there would be consistent non-hits of BAYES_99 with BAYES_999 all
> >>> of the time.  This is only happening a small percentage of the
> >>> time.  
> > 
> > On 02.01.18 15:39, @lbutlr wrote:  
> >> Checking my mail I see an incidence rate of this of about 0.5%,
> >> which matches the rate you posted earlier.  
> > 
> > amavis?
> >   
> 
> I am seeing this problem on my MailScanner filters as well:
> 
> # grep BAYES_999 maillog-20171231 | wc -l
> 9172
> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
> 4

Are you sure that's right? It's a radically different frequency from
0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think
they are and not something like 

... rules: meta test FOO has dependency 'BAYES_999' with a zero score

I get some bogus warnings like this.

You need something to make sure it's a result line and some  boundary
checks like \bBAYES_99\b might help too.


I can't reproduce the problem using spamc/spamd  from 3.4.1 with perl
5.24.3 on FreeBSD 11.1 with  Berkeley DB. 

I don't have any missing BAYES_99 hits in my corpus headers and to
check it's not a recent bug I rescanned ~5k spams and checked the logs.

Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:
>> On 2 Jan 2018, at 07:17, David Jones djones@ena.com> wrote:
>>> I haven't redefined these rules from what I can tell by searching my
>>> local rules.  I would think that if I had done this, then there would be
>>> consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
>>> only happening a small percentage of the time.
> 
> On 02.01.18 15:39, @lbutlr wrote:
>> Checking my mail I see an incidence rate of this of about 0.5%, which 
>> matches the rate you posted earlier.
> 
> amavis?
> 

I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4

I have a temporary fix in place with a meta rule:

meta            MISSING_BAYES_99    BAYES_999 && !BAYES_99
describe        MISSING_BAYES_99    BAYES_99 should always hit with 
BAYES_999 but sometimes it doesn't.
score           MISSING_BAYES_99    4.2

-- 
David Jones

Re: Question about BAYES_999

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On 2 Jan 2018, at 07:17, David Jones djones@ena.com> wrote:
>> I haven't redefined these rules from what I can tell by searching my
>> local rules.  I would think that if I had done this, then there would be
>> consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
>> only happening a small percentage of the time.

On 02.01.18 15:39, @lbutlr wrote:
>Checking my mail I see an incidence rate of this of about 0.5%, which matches the rate you posted earlier.

amavis?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The early bird may get the worm, but the second mouse gets the cheese. 

Re: Question about BAYES_999

Posted by "@lbutlr" <kr...@kreme.com>.
On 2 Jan 2018, at 07:17, David Jones djones@ena.com> wrote:
> I haven't redefined these rules from what I can tell by searching my local rules.  I would think that if I had done this, then there would be consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is only happening a small percentage of the time.

Checking my mail I see an incidence rate of this of about 0.5%, which matches the rate you posted earlier.

-- 
Hey, how come Andrew gets to get up? If he gets up, we'll all get up!
It'll be anarchy!


Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/02/2018 07:57 AM, RW wrote:
> On Mon, 1 Jan 2018 18:52:45 -0600
> David Jones wrote:
> 
>> On 01/01/2018 06:47 PM, Reindl Harald wrote:
>>>
>>>
>>> Am 02.01.2018 um 01:18 schrieb David Jones:
>>>> I just had a spam message hit BAYES_999 but not BAYES_99.  Based
>>>> on BAYES_999 default score of 0.2, I thought that it was always
>>>> supposed to complement the BAYES_99 rule and both would trigger
>>>> when BAYES_999 hit.
>>>>
> 
>>
>> I checked my logs and I am seeing both together when BAYES_999 hits
>> except for a few times.  Is this a bug?
> 
> It's certainly not the intended behaviour.
> 
> Firstly check that you haven't redefined either of the
> BAYES_999/BAYES_99 rules and forgotten about it.
>

I haven't redefined these rules from what I can tell by searching my 
local rules.  I would think that if I had done this, then there would be 
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is 
only happening a small percentage of the time.

> Secondly are you using shortcircuiting at all? I'm wondering if this
> might be a buggy or misconfigured shortcircuit.
> 

I am using Shortcircuiting but these messages aren't hitting the 
SHORTCIRCUIT rule.

This amavis-new box that has this problem is my spam/ham collector for 
the SA nightly maschecking.  It shares the Bayes DB via redis with my 8 
main SA MailScanner filters.  Let me do some more digging in my primary 
filter logs to see if this is happening there and I will reply back. 
Those logs are very large so it will take some time to grep.

-- 
David Jones

Re: Question about BAYES_999

Posted by RW <rw...@googlemail.com>.
On Mon, 1 Jan 2018 18:52:45 -0600
David Jones wrote:

> On 01/01/2018 06:47 PM, Reindl Harald wrote:
> > 
> > 
> > Am 02.01.2018 um 01:18 schrieb David Jones:  
> >> I just had a spam message hit BAYES_999 but not BAYES_99.  Based
> >> on BAYES_999 default score of 0.2, I thought that it was always
> >> supposed to complement the BAYES_99 rule and both would trigger
> >> when BAYES_999 hit.
> >>

> 
> I checked my logs and I am seeing both together when BAYES_999 hits 
> except for a few times.  Is this a bug?  

It's certainly not the intended behaviour.

Firstly check that you haven't redefined either of the
BAYES_999/BAYES_99 rules and forgotten about it.

Secondly are you using shortcircuiting at all? I'm wondering if this
might be a buggy or misconfigured shortcircuit.


Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/01/2018 07:08 PM, Reindl Harald wrote:
> 
> 
> Am 02.01.2018 um 01:59 schrieb David Jones:
>> On 01/01/2018 06:52 PM, David Jones wrote:
>>> On 01/01/2018 06:47 PM, Reindl Harald wrote:
>>>>
>>>> Am 02.01.2018 um 01:18 schrieb David Jones:
>>>>> I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
>>>>> BAYES_999 default score of 0.2, I thought that it was always 
>>>>> supposed to complement the BAYES_99 rule and both would trigger 
>>>>> when BAYES_999 hit.
>>>>>
>>>>> https://pastebin.com/QsVgXwdC
>>>>>
>>>>> If they are independent, then it would seem logical to bump up the 
>>>>> default score higher than BAYES_99
>>>>
>>>> never ever seen that and since bayes is based on a number between 0 
>>>> and 1 this should be technically impossible at all
>>>>
>>>> with BAYES_00 that message has [score: 0.0003]
>>>>
>>>
>>> I checked my logs and I am seeing both together when BAYES_999 hits 
>>> except for a few times.  Is this a bug?  Should I open a bug issue?  
>>> I am not sure how to reproduce the problem unless others also see the 
>>> same thing with that message.
>>>
>>
>> Sorry.  Not thinking clearly.  Others would have to have the same 
>> Bayes DB to get that message to do the same thing.  I was able to 
>> reproduce the same results on another SA platform running MailScanner 
>> using the same Bayes DB in redis.
>>
>> If others could check their mail logs to see if they are hitting 
>> BAYES_999 without BAYES_99 on the same message, please let me know
> 
> [sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
> wc -l
> 9125
> 
> [sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
> grep "BAYES_99," | wc -l
> 9125
> 
> [sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
> grep -v "BAYES_99," | wc -l
> 0
> 

Since yesterday morning:

# grep "BAYES_999=" /var/log/maillog | grep "BAYES_99=" | wc -l
8006
# grep "BAYES_999=" /var/log/maillog | wc -l
8092
# grep "BAYES_999=" /var/log/maillog | grep -v "BAYES_99=" | wc -l
86

Last week:

# grep "BAYES_999=" /var/log/maillog-20171231 | grep "BAYES_99=" | wc -l
43753
# grep "BAYES_999=" /var/log/maillog-20171231 | wc -l
44108
# grep "BAYES_999=" /var/log/maillog-20171231 | grep -v "BAYES_99=" | wc -l
355

-- 
David Jones

Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/01/2018 06:52 PM, David Jones wrote:
> On 01/01/2018 06:47 PM, Reindl Harald wrote:
>>
>>
>> Am 02.01.2018 um 01:18 schrieb David Jones:
>>> I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
>>> BAYES_999 default score of 0.2, I thought that it was always supposed 
>>> to complement the BAYES_99 rule and both would trigger when BAYES_999 
>>> hit.
>>>
>>> https://pastebin.com/QsVgXwdC
>>>
>>> If they are independent, then it would seem logical to bump up the 
>>> default score higher than BAYES_99
>>
>> never ever seen that and since bayes is based on a number between 0 
>> and 1 this should be technically impossible at all
>>
>> with BAYES_00 that message has [score: 0.0003]
>>
> 
> I checked my logs and I am seeing both together when BAYES_999 hits 
> except for a few times.  Is this a bug?  Should I open a bug issue?  I 
> am not sure how to reproduce the problem unless others also see the same 
> thing with that message.
> 

Sorry.  Not thinking clearly.  Others would have to have the same Bayes 
DB to get that message to do the same thing.  I was able to reproduce 
the same results on another SA platform running MailScanner using the 
same Bayes DB in redis.

If others could check their mail logs to see if they are hitting 
BAYES_999 without BAYES_99 on the same message, please let me know.

-- 
David Jones

Re: Question about BAYES_999

Posted by David Jones <dj...@ena.com>.
On 01/01/2018 06:47 PM, Reindl Harald wrote:
> 
> 
> Am 02.01.2018 um 01:18 schrieb David Jones:
>> I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
>> BAYES_999 default score of 0.2, I thought that it was always supposed 
>> to complement the BAYES_99 rule and both would trigger when BAYES_999 
>> hit.
>>
>> https://pastebin.com/QsVgXwdC
>>
>> If they are independent, then it would seem logical to bump up the 
>> default score higher than BAYES_99
> 
> never ever seen that and since bayes is based on a number between 0 and 
> 1 this should be technically impossible at all
> 
> with BAYES_00 that message has [score: 0.0003]
> 

I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  I 
am not sure how to reproduce the problem unless others also see the same 
thing with that message.

-- 
David Jones