You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Reindl Harald <h....@thelounge.net> on 2016/06/10 14:57:45 UTC

why does that mail not get any bayes-classification

see attachemnt, no bayes tag at all looks like a major bug somewhere

Content analysis details:   (30.5 points, 5.5 required)

  pts rule name              description
---- ---------------------- 
--------------------------------------------------
  4.5 CUST_DNSBL_10_SORBS_WEB RBL: dnsbl.sorbs.net (web.dnsbl.sorbs.net)
                             [213.252.170.66 listed in dnsbl.sorbs.net]
  5.5 CUST_DNSBL_6_ZEN_XBL   RBL: zen.spamhaus.org (xbl.spamhaus.org)
                             [213.252.170.66 listed in zen.spamhaus.org]
  2.5 CUST_DNSBL_13_SEM      RBL: bl.spameatingmonkey.net
                             [213.252.170.66 listed in 
bl.spameatingmonkey.net]
  5.0 CUST_DNSBL_7_CUDA      RBL: b.barracudacentral.org
                             [213.252.170.66 listed in 
b.barracudacentral.org]
  2.5 CUST_DNSBL_16_PSBL     RBL: dnsbl-surriel.thelounge.net
                             (psbl.surriel.com)
                         [213.252.170.66 listed in 
dnsbl-surriel.thelounge.net]
  6.0 CLAMAV                 ClamAV detected malware or phishing
[Sanesecurity.Foxhole.Zip_fs226.UNOFFICIAL(b06c82bcd10ed85fd9a7103b5fe18e0d:1343)]
  1.0 CUST_DNSBL_30_SENDERSC_MED RBL: score.senderscore.com
                             (senderscore.com Medium)
                             [213.252.170.66 listed in 
score.senderscore.com]
  2.5 RDNS_NONE              Delivered to internal network by a host 
with no rDNS
  0.0 RCVD_IN_MSPIKE_BL      Mailspike blacklisted
  0.5 RCVD_IN_MSPIKE_ZBI     No description available.
  0.5 HELO_MISC_IP           Looking for more Dynamic IP Relays

Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 17:03 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 2:13 AM:
>> sadly that works only for mail-headers - but don't appear in the logs
>> nor in a report generated with "/usr/bin/spamc -R" over a webinterface
>> which proceeds uploads of eml-files :-(
>
> Yes I see how that could be useful. It might work if you define a meta rule
> that checks for there not being any of the BAYES_NN rules

but that wouldn't say anything useful because you can't distinct between 
"bayes not working at all", "to few training-messages" and "not enugh 
useful tokens"

> That doesn't give
> you access to _SENDERDOMAIN_ and _AUTHORDOMAIN_ tags, though

jep


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 2:13 AM:
> sadly that works only for mail-headers - but don't appear in the logs 
> nor in a report generated with "/usr/bin/spamc -R" over a webinterface 
> which proceeds uploads of eml-files :-(

Yes I see how that could be useful. It might work if you define a meta rule
that checks for there not being any of the BAYES_NN rules. That doesn't give
you access to _SENDERDOMAIN_ and _AUTHORDOMAIN_ tags, though.

 Sidney


Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 12.06.2016 um 00:27 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 9:31 AM:
>>
>> headers don't help when you have a "spamd: result" log-line with a ton
>
> Ah, finally I understand what you are trying to do! You analyze the spamd
> result log lines, and they currently have two deficiencies: 1) They do not
> distinguish between Bayes failing and Bayes simply not finding significant
> tokens in the message; 2) They don't provide a way of matching the result log
> line to the actual message when the message does not contain an mid.
>
> Your proposed solution of a BAYES_NOTOK rule would solve the first one. The
> second is a bit trickier. Really there ought to be a way to configure custom
> output in the spamd result log line, or to have a rule that can include some
> information in addition to its name and score in its report

both would require small changes in SA itself

the second is not really trickier, when the code is able to put the MID 
in the log line than it has also the informations about sender/rcpt and 
just don't put it into the logs


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 9:31 AM:
> 
> headers don't help when you have a "spamd: result" log-line with a ton 

Ah, finally I understand what you are trying to do! You analyze the spamd
result log lines, and they currently have two deficiencies: 1) They do not
distinguish between Bayes failing and Bayes simply not finding significant
tokens in the message; 2) They don't provide a way of matching the result log
line to the actual message when the message does not contain an mid.

Your proposed solution of a BAYES_NOTOK rule would solve the first one. The
second is a bit trickier. Really there ought to be a way to configure custom
output in the spamd result log line, or to have a rule that can include some
information in addition to its name and score in its report.

 Sidney



Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 23:26 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 9:08 AM:
>>
>> and it's not worth to discuss since the *real* solution would be a
>> "BAYES_NOTOKS" which would appear *everywhere* and clearly explain why
>> no other BAYES_XX is present
>
> I can't argue with that. Without the ability to make it a rule or a meta-rule
> that would only show if there are no BAYES_NN rules triggered, adding the tags
> to the report unconditionally would not be the same as a BAYES_NOTOK rule. As
> far as I can tell implementing BAYES_NOTOK would require a (small) change in
> the Bayes plugin. It could not be done in the configuration file or by writing
> a new rule. So you are right about that.

that's exactly the point

> However, what you said about _SENDERDOMAIN_ and _AUTHORDOMAIN_ could be
> handled by adding a report line that contains those tags just before or just
> after the report _SUMMARY_ line in the configuration

headers don't help when you have a "spamd: result" log-line with a ton 
of rules or a new rule you are trying out appears when the message has 
no message-id since your only anchor is the mid=<> part of the logline 
from which you can grep the other relevant MTA lines and find out who 
was the sender, who was the rcpt and from where did that message arrive 
at all

keep in mind: you get all that headers only in your own mails, they are 
not helping you much as sysadmin for a lot of users where you try to 
find out if rules needs to be rescored in whatever direction


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 9:08 AM:
> 
> and it's not worth to discuss since the *real* solution would be a 
> "BAYES_NOTOKS" which would appear *everywhere* and clearly explain why 
> no other BAYES_XX is present

I can't argue with that. Without the ability to make it a rule or a meta-rule
that would only show if there are no BAYES_NN rules triggered, adding the tags
to the report unconditionally would not be the same as a BAYES_NOTOK rule. As
far as I can tell implementing BAYES_NOTOK would require a (small) change in
the Bayes plugin. It could not be done in the configuration file or by writing
a new rule. So you are right about that.

However, what you said about _SENDERDOMAIN_ and _AUTHORDOMAIN_ could be
handled by adding a report line that contains those tags just before or just
after the report _SUMMARY_ line in the configuration.

Sidney



Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 23:00 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 8:37 AM:
>>
>> it is not part of the report itself while tags, scores and descriptions
>> are - a report is something like this:
>
> what you showed is defined in the configuration file using "report". Those
> just happen to be the last lines of it in the default configuration. That
> default uses the template tags _SCORE_, _REQD_, and _SUMMARY_.
>
> What I'm saying is that you can include "report" lines in the configuration
> that use the _SENDERDOMAIN_, _AUTHORDOMAIN_, and the various Bayes related
> tags and they will show up in the report. If you can use the report that you
> showed, then you can make use of those tags. They won't be in the table of
> points, rules, and description. That table is what _SUMMARY_ expands into. But
> you can insert them into the report that you see using spamc -R. They can even
> come after _SUMMARY_ if you want to

you can do a lot

the whole purpose here is to

a) upload a eml file on a webserver
b) spamc -R -l -s 20000000 --socket /socket-path < upload.eml
c) display the part startign with "Content analysis details"
d) combine it with clamd results
e) display the raw-eml on the bottom of the website

all the header tricks are *not* part of it
____________________

and it's not worth to discuss since the *real* solution would be a 
"BAYES_NOTOKS" which would appear *everywhere* and clearly explain why 
no other BAYES_XX is present


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 8:37 AM:
> 
> it is not part of the report itself while tags, scores and descriptions 
> are - a report is something like this:

what you showed is defined in the configuration file using "report". Those
just happen to be the last lines of it in the default configuration. That
default uses the template tags _SCORE_, _REQD_, and _SUMMARY_.

What I'm saying is that you can include "report" lines in the configuration
that use the _SENDERDOMAIN_, _AUTHORDOMAIN_, and the various Bayes related
tags and they will show up in the report. If you can use the report that you
showed, then you can make use of those tags. They won't be in the table of
points, rules, and description. That table is what _SUMMARY_ expands into. But
you can insert them into the report that you see using spamc -R. They can even
come after _SUMMARY_ if you want to.

 Sidney


Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 19:06 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 4:44 AM:
>> look above "sadly that works only for mail-headers - but don't appear in
>> the logs"
>
> Oh, you mentioned spamc -R before and it does appear in that output. I got
> that confused with the spamd logs - You're right, I don't see them there.

it is not part of the report itself while tags, scores and descriptions 
are - a report is something like this:


Content analysis details:   (28.1 points, 5.5 required)

  pts rule name              description
---- ---------------------- 
--------------------------------------------------
-0.1 CUST_DNSWL_2_SENDERSC_LOW RBL: score.senderscore.com (Low Trust)
                             [145.253.224.163 listed in 
score.senderscore.com]
  1.0 NIXSPAM_IXHASH         DIGEST: ix.dnsbl.manitu.net
  7.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
                             [score: 0.9995]
  1.5 SPF_SOFTFAIL           SPF: sender does not match SPF record 
(softfail)
  4.0 DKIM_ADSP_DISCARD      No valid author signature, domain signs all 
mail
                             and suggests discarding the rest
  2.0 DATE_IN_FUTURE_06_12   Date: is 6 to 12 hours after Received: date
  2.5 CUST_BODY_CONTAINS_M   BODY: Contains Medium
  0.5 CUST_BODY_CONTAINS_VL  BODY: Contains Very Low
  0.0 HTML_MESSAGE           BODY: HTML included in message
  0.4 BAYES_999              BODY: Bayes spam probability is 99.9 to 100%
                             [score: 0.9995]
  2.0 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
                             above 50%
                             [cf: 100]
  0.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                             [cf: 100]
  1.0 FSL_BULK_SIG           Bulk signature with no Unsubscribe
  0.7 LOTS_OF_MONEY          Huge... sums of money
  0.1 MSGID_FROM_MTA_HEADER  Message-Id was added by a relay
  1.5 IXHASH_CHECK           Message hits one ore more IXHASH digest-sources
  2.5 DIGEST_MULTIPLE_LOCAL  Message hits more than one network digest check
                              (razor, pyzor, ixhash)
  0.0 T_REMOTE_IMAGE         Message contains an external image


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 4:44 AM:
> look above "sadly that works only for mail-headers - but don't appear in 
> the logs"
> 

Oh, you mentioned spamc -R before and it does appear in that output. I got
that confused with the spamd logs - You're right, I don't see them there.

 Sidney


Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 18:43 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 2:13 AM:
>> sadly that works only for mail-headers - but don't appear in the logs
>
> I just tried adding this to my local configuration, not pretty, just to see
> what it would do
>
> report tag values sender _SENDERDOMAIN_  author _AUTHORDOMAIN_  bayesh
> _BAYESTCHAMMY_ bayses _BAYESTCSPAMMY_
>
> and I got this line in the report
>
> tag values sender vantoll.nl  author vantoll.nl  bayesh 3 bayses 6
>
> So even though the documentation is not clear about it, you can use those
> template tags in the report option too

look above "sadly that works only for mail-headers - but don't appear in 
the logs"


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 2:13 AM:
> sadly that works only for mail-headers - but don't appear in the logs 

I just tried adding this to my local configuration, not pretty, just to see
what it would do

report tag values sender _SENDERDOMAIN_  author _AUTHORDOMAIN_  bayesh
_BAYESTCHAMMY_ bayses _BAYESTCSPAMMY_

and I got this line in the report

tag values sender vantoll.nl  author vantoll.nl  bayesh 3 bayses 6

So even though the documentation is not clear about it, you can use those
template tags in the report option too.

 Sidney


Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 15:55 schrieb Sidney Markowitz:
> Reindl Harald wrote on 12/06/16 1:04 AM:
>> output of "spamassassin -D  < ignored_by_bayes_stripped.eml" attached
>
> See this line in that output:
>
>   Jun 11 14:47:00.510 [5188] dbg: bayes: cannot use bayes on this message; not
> enough usable tokens found
>
>> i would expect a bayes result in any case and even if it's just a
>> informational BAYES_NOTOKS
>
> Not by default, but see the tags in the next lines in your debug output,
> starting with
>
> Jun 11 14:47:00.510 [5188] dbg: check: tagrun - tag BAYESTCHAMMY is now ready,
> value: 0
> Jun 11 14:47:00.510 [5188] dbg: check: tagrun - tag BAYESTCSPAMMY is now
> ready, value: 0
>
> And see how you can add custom headers to your output that includes these tags
> as documented here:
>
> https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#template_tags

sadly that works only for mail-headers - but don't appear in the logs 
nor in a report generated with "/usr/bin/spamc -R" over a webinterface 
which proceeds uploads of eml-files :-(

a test-tag BAYES_NOTOKS or however called would appear on all places 
while for the spamd-log _SENDERDOMAIN_ and _AUTHORDOMAIN_ would give the 
benefit that you could also find out something useful about a result 
when there is no message-id


Re: why does that mail not get any bayes-classification

Posted by Sidney Markowitz <si...@sidney.com>.
Reindl Harald wrote on 12/06/16 1:04 AM:
> output of "spamassassin -D  < ignored_by_bayes_stripped.eml" attached

See this line in that output:

  Jun 11 14:47:00.510 [5188] dbg: bayes: cannot use bayes on this message; not
enough usable tokens found

> i would expect a bayes result in any case and even if it's just a 
> informational BAYES_NOTOKS

Not by default, but see the tags in the next lines in your debug output,
starting with

Jun 11 14:47:00.510 [5188] dbg: check: tagrun - tag BAYESTCHAMMY is now ready,
value: 0
Jun 11 14:47:00.510 [5188] dbg: check: tagrun - tag BAYESTCSPAMMY is now
ready, value: 0

And see how you can add custom headers to your output that includes these tags
as documented here:

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#template_tags

 Sidney


Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 11.06.2016 um 08:17 schrieb RW:
> On Sat, 11 Jun 2016 04:52:48 +0200
> Reindl Harald wrote:
>
>> Am 10.06.2016 um 23:52 schrieb RW:
>>> On Fri, 10 Jun 2016 16:57:45 +0200
>>> Reindl Harald wrote:
>>>
>>>> see attachemnt, no bayes tag at all looks like a major bug
>>>> somewhere
>>>
>>> In the absence of any debug it's hard to say.
>>
>> hence i attached the sample
>
> An email is not debug. I can't run it on *your* system.

output of "spamassassin -D  < ignored_by_bayes_stripped.eml" attached

>>> It is possible for no tokens to make it through the selection, in
>>> which case there is no result. That's more likely than normal in
>>> your case since you don't train on headers.

i would expect a bayes result in any case and even if it's just a 
informational BAYES_NOTOKS

>> if you would have looked at the message you would have seen that
>> there is content and not only headers and it looks like the message
>> has just incorrect mime-definitions (missing end headers)
>
> Of course I looked at it. And I ran it through spamassassin.
>
> Aside from header tokens, what made it past the token selection on my
> database was only:
>
>    'marcus','Marcus','enclosed','invoice','business' and 'thank'
>
> It's quite possible that all the body tokens in that email were
> in the neutral range on your system, which would cause Bayes to exit
> without producing a classification.

as said: i would expect a rule-hit which states this


Re: why does that mail not get any bayes-classification

Posted by RW <rw...@googlemail.com>.
On Sat, 11 Jun 2016 04:52:48 +0200
Reindl Harald wrote:

> Am 10.06.2016 um 23:52 schrieb RW:
> > On Fri, 10 Jun 2016 16:57:45 +0200
> > Reindl Harald wrote:
> >  
> >> see attachemnt, no bayes tag at all looks like a major bug
> >> somewhere  
> >
> > In the absence of any debug it's hard to say.  
> 
> hence i attached the sample

An email is not debug. I can't run it on *your* system.

> > It is possible for no tokens to make it through the selection, in
> > which case there is no result. That's more likely than normal in
> > your case since you don't train on headers.  
> 
> if you would have looked at the message you would have seen that
> there is content and not only headers and it looks like the message
> has just incorrect mime-definitions (missing end headers)

Of course I looked at it. And I ran it through spamassassin.

Aside from header tokens, what made it past the token selection on my
database was only:

   'marcus','Marcus','enclosed','invoice','business' and 'thank'

It's quite possible that all the body tokens in that email were
in the neutral range on your system, which would cause Bayes to exit
without producing a classification. 

In the absence of any debug against your database, there is nothing
particularly suspicious here.

Re: why does that mail not get any bayes-classification

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Sat, 11 Jun 2016, Reindl Harald wrote:

>
>
> Am 10.06.2016 um 23:52 schrieb RW:
>> On Fri, 10 Jun 2016 16:57:45 +0200
>> Reindl Harald wrote:
>> 
>>> see attachemnt, no bayes tag at all looks like a major bug somewhere
>> 
>> In the absence of any debug it's hard to say.
>
> hence i attached the sample
>
>> It is possible for no tokens to make it through the selection, in which
>> case there is no result. That's more likely than normal in your case
>> since you don't train on headers.
>
> if you would have looked at the message you would have seen that there is 
> content and not only headers and it looks like the message has just incorrect 
> mime-definitions (missing end headers)
>
> since thunderbird shows the attachment as well as the mail content that would 
> be a way for spammers to completly trick out SA

There may be a bug but I don't it is in the SA distro.

I took your sample and fed it to my SA kit. First time thru it hit BAYES_50, I
then did a "sa-learn --spam < /tmp/ignored_by_bayes_stripped.eml" and retested 
it. It then hit BAYES_999.

So I'd say standard SA + Bayes works on that message. Somebody at your site may
have done some modifications to your SA that is causing you problems.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: why does that mail not get any bayes-classification

Posted by Reindl Harald <h....@thelounge.net>.

Am 10.06.2016 um 23:52 schrieb RW:
> On Fri, 10 Jun 2016 16:57:45 +0200
> Reindl Harald wrote:
>
>> see attachemnt, no bayes tag at all looks like a major bug somewhere
>
> In the absence of any debug it's hard to say.

hence i attached the sample

> It is possible for no tokens to make it through the selection, in which
> case there is no result. That's more likely than normal in your case
> since you don't train on headers.

if you would have looked at the message you would have seen that there 
is content and not only headers and it looks like the message has just 
incorrect mime-definitions (missing end headers)

since thunderbird shows the attachment as well as the mail content that 
would be a way for spammers to completly trick out SA


Re: why does that mail not get any bayes-classification

Posted by RW <rw...@googlemail.com>.
On Fri, 10 Jun 2016 16:57:45 +0200
Reindl Harald wrote:

> see attachemnt, no bayes tag at all looks like a major bug somewhere

In the absence of any debug it's hard to say.

It is possible for no tokens to make it through the selection, in which
case there is no result. That's more likely than normal in your case
since you don't train on headers.