You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Roman Gelfand <rg...@gmail.com> on 2015/06/23 14:34:52 UTC

bayes filtlering

Periodically, I am running the following command on my spam box...
sa-learn --no-sync --spam /mbx/adomain.com/auser/Maildir/.Junk/{cur,new}

It seems to work.  However, I continue to get this message type.  Why?
Here is SA message.

X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.adomain.com
X-Spam-Level: ***
X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS,URIBL_BLOCKED autolearn=no
	version=3.3.2


Thanks in advance

Re: bayes filtlering

Posted by Reindl Harald <h....@thelounge.net>.

Am 25.10.2015 um 19:06 schrieb Roman Gelfand:
> In you post some time ago, you had mentioned that my configuration may
> not be sufficient to block emails using bayes filtering.  Below, is my
> configuration.  I had since fixed the dns issue.  But not sure how to
> deal with non-changing score from 3.5    I am getting 4 emails/day with
> score 3.5.
>
> X-Spam-Status: No, score=3.6 required=5.0 tests=AWL,BAYES_99,BAYES_999,
> 	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no
> 	version=3.3.2

* introduce body-filters
* raise you bayes-scoring if you trust your training

caution: that's for a long trained bayes with around 70000 sample 
messages, a ton of low-scored whitelists and a milter-reject of 8.0

# adjust bayes scoring
score BAYES_00 -3.5
score BAYES_05 -2.0
score BAYES_20 -1.0
score BAYES_40 -0.5
score BAYES_50 1.5
score BAYES_60 3.5
score BAYES_80 5.5
score BAYES_95 6.5
score BAYES_99 7.5
score BAYES_999 0.4

body      CUST_BODY_17    /.*(1st page ranking of google|a company which 
can understand you & your business).*/i
score     CUST_BODY_17    1.5
describe  CUST_BODY_17    Contains Low


Re: bayes filtlering

Posted by Roman Gelfand <rg...@gmail.com>.
In you post some time ago, you had mentioned that my configuration may not
be sufficient to block emails using bayes filtering.  Below, is my
configuration.  I had since fixed the dns issue.  But not sure how to deal
with non-changing score from 3.5    I am getting 4 emails/day with score
3.5.

X-Spam-Status: No, score=3.6 required=5.0 tests=AWL,BAYES_99,BAYES_999,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no
	version=3.3.2


required_score                  5.0
# rewrite_header Subject        [SPAM]
rewrite_header Subject
# trusted_networks              192.168.7.0/24 192.168.3.0/24
report_safe                     0
use_bayes                       1
bayes_auto_learn                1
skip_rbl_checks                 0
use_razor2                      1
use_pyzor                       1
ok_languages                    en

user_scores_dsn                 DBI:mysql:spamassassin:localhost
user_scores_sql_username        spamd
user_scores_sql_password        XXXXXXXXXXXX=9
auto_whitelist_factory          Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                    DBI:mysql:spamassassin:localhost
user_awl_sql_table              awl
user_awl_sql_username           spamd
user_awl_sql_password           onepluseight=9
bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:spamassassin:localhost
bayes_sql_username              spamd
bayes_sql_password              XXXXXXXXXXXXx=9

On Tue, Jun 23, 2015 at 2:52 PM, Bill Cole <
sausers-20150205@billmail.scconsult.com> wrote:

> On 23 Jun 2015, at 8:34, Roman Gelfand wrote:
>
> Periodically, I am running the following command on my spam box...
>> sa-learn --no-sync --spam /mbx/adomain.com/auser/Maildir/.Junk/{cur,new}
>> <http://adomain.com/auser/Maildir/.Junk/%7Bcur,new%7D>
>>
>> It seems to work.  However, I continue to get this message type.  Why?
>> Here is SA message.
>>
>> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
>> mail.adomain.com
>> X-Spam-Level: ***
>> X-Spam-Status: No, score=3.6 required=5.0
>> tests=BAYES_99,BAYES_999,DKIM_SIGNED,
>>         DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS,URIBL_BLOCKED
>> autolearn=no
>>         version=3.3.2
>>
>
> Your configuration appears to use the default scores for the rules that
> are being hit there and for the "required" threshold. A 100% certain Bayes
> judgment (technically anything >99.9%) only adds up to a score of 3.7 with
> the default scores, and the default threshold is 5.0, so you need
> *something more* than a Bayes certainty to get SA to call anything spam,
> using the default configuration. Without seeing the actual mail, what
> "more" might be is a generic theoretical discussion.
>
> However, in this case there's an obvious first thing to fix: stop using a
> shared DNS resolver.
>
> The URIBL_BLOCKED "rule" is a message from the operators of the uribl.com
> service that the DNS resolver used for a query is explicitly refused
> service. The most common reason for this is excess query volume from a
> resolver. The only likely reasons for you to hit this are:
>
> 1. You are scanning so much mail with SA that you must be a large
> commercial operation capable of helping to support the uribl.com service
> as "free for most," so they require you to do so. This seems unlikely for
> someone newly setting up SA...
>
> 2. You are using a DNS resolver that is shared by a large number of other
> people and in aggregate you are all pounding the uribl.com nameservers as
> if you are a commercial service provider or large business.
>
> The solution for (2) is a step that should be part of running *ANY* MTA
> that accepts mail from the world at large: bring up a caching recursive
> (NOT forwarding) resolver DNS daemon on the same host (or in multi-host
> environments: same physical LAN) as the MTA and use it as the resolver for
> the MTA. In addition to being able to use services like uribl.com and
> Spamhaus that block large resolvers who don't support them, having your own
> resolver makes DNS resolution substantially faster on average for your MTA.
> With a modern MTA doing basic spam control, DNS resolution time is a
> substantial contributor to session lifetime, which is a major determinant
> of overall capacity. Another positive advantage is that many shared
> resolvers (especially those run by ISPs) do non-standard things in response
> to some queries designed to either assist and protect web-surfing users or
> line their own pockets, depending on the particular resolver and one's PoV.
> None of those tricks are helpful for an MTA, and some can be positively
> harmful, so you shouldn't do resolution for an MTA through such a server. A
> caching-only recursive nameserver isn't a substantial load and isn't
> difficult to configure, and many OS distributions include such a
> configuration in the base OS (e.g. FreeBSD) or as the default config in
> packages of ISC BIND and/or other DNS daemons.
>
>

Re: bayes filtlering

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 23 Jun 2015, at 8:34, Roman Gelfand wrote:

> Periodically, I am running the following command on my spam box...
> sa-learn --no-sync --spam 
> /mbx/adomain.com/auser/Maildir/.Junk/{cur,new}
>
> It seems to work.  However, I continue to get this message type.  Why?
> Here is SA message.
>
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on 
> mail.adomain.com
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.6 required=5.0 
> tests=BAYES_99,BAYES_999,DKIM_SIGNED,
> 	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS,URIBL_BLOCKED 
> autolearn=no
> 	version=3.3.2

Your configuration appears to use the default scores for the rules that 
are being hit there and for the "required" threshold. A 100% certain 
Bayes judgment (technically anything >99.9%) only adds up to a score of 
3.7 with the default scores, and the default threshold is 5.0, so you 
need *something more* than a Bayes certainty to get SA to call anything 
spam, using the default configuration. Without seeing the actual mail, 
what "more" might be is a generic theoretical discussion.

However, in this case there's an obvious first thing to fix: stop using 
a shared DNS resolver.

The URIBL_BLOCKED "rule" is a message from the operators of the 
uribl.com service that the DNS resolver used for a query is explicitly 
refused service. The most common reason for this is excess query volume 
from a resolver. The only likely reasons for you to hit this are:

1. You are scanning so much mail with SA that you must be a large 
commercial operation capable of helping to support the uribl.com service 
as "free for most," so they require you to do so. This seems unlikely 
for someone newly setting up SA...

2. You are using a DNS resolver that is shared by a large number of 
other people and in aggregate you are all pounding the uribl.com 
nameservers as if you are a commercial service provider or large 
business.

The solution for (2) is a step that should be part of running *ANY* MTA 
that accepts mail from the world at large: bring up a caching recursive 
(NOT forwarding) resolver DNS daemon on the same host (or in multi-host 
environments: same physical LAN) as the MTA and use it as the resolver 
for the MTA. In addition to being able to use services like uribl.com 
and Spamhaus that block large resolvers who don't support them, having 
your own resolver makes DNS resolution substantially faster on average 
for your MTA. With a modern MTA doing basic spam control, DNS resolution 
time is a substantial contributor to session lifetime, which is a major 
determinant of overall capacity. Another positive advantage is that many 
shared resolvers (especially those run by ISPs) do non-standard things 
in response to some queries designed to either assist and protect 
web-surfing users or line their own pockets, depending on the particular 
resolver and one's PoV. None of those tricks are helpful for an MTA, and 
some can be positively harmful, so you shouldn't do resolution for an 
MTA through such a server. A caching-only recursive nameserver isn't a 
substantial load and isn't difficult to configure, and many OS 
distributions include such a configuration in the base OS (e.g. FreeBSD) 
or as the default config in packages of ISC BIND and/or other DNS 
daemons.


Re: bayes filtlering

Posted by Reindl Harald <h....@thelounge.net>.
Am 23.06.2015 um 14:34 schrieb Roman Gelfand:
> Periodically, I am running the following command on my spam box...
> sa-learn --no-sync --spam /mbx/adomain.com/auser/Maildir/.Junk/{cur,new}
> <http://adomain.com/auser/Maildir/.Junk/{cur,new}>
>
> It seems to work.  However, I continue to get this message type.  Why?
> Here is SA message.
>
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) onmail.adomain.com <http://mail.adomain.com>
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
> 	DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS,URIBL_BLOCKED autolearn=no
> 	version=3.3.2

because the score for BAYES_99 + BAYES_999 alone is not high enough but 
your *realy problem* is URIBL_BLOCKED and that's just because you are 
using a DNS forwarder instead a local cache, that topic was discussed 
thousands of times, so solve that and then consider *careful* to adjust 
scores because it heavily depends on how your bayes in both directions 
is trained

/etc/mail/spamassassin/local.cf
score BAYES_00 -3.5
score BAYES_05 -2.0
score BAYES_20 -1.0
score BAYES_40 -0.5
score BAYES_50 1.8
score BAYES_60 3.5
score BAYES_80 5.0
score BAYES_95 6.5
score BAYES_99 7.5
score BAYES_999 0.4