You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Timothy Murphy <ga...@alice.it> on 2014/08/15 13:05:26 UTC

Second step with SA

Having got SA working at last on my CentOS-7 home server,
I'm thinking of improving its use for me (no-one else).
It's finding about 65% of my spam, and I'd like to increase that to 80%.

1) What is the simplest way to reject mail in chinese, russian
and turkish?

2) I get some email wrongly marked spam - always from the same site.
I'm tried marking this as ham (and running "sa-learn --ham")
but this has surprisingly little effect.

3) Bizarrely, I seem to be getting a lot of spam from Brazil -
at least I assume that is where *.br lives?
I guess 1% of email from Brazil might be legit,
but losing it is a small sacrifice.
I guess I could look at the sites - there may be only a couple.
What is the easiest way to define email from a given site as spam?

-- 
Timothy Murphy  
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland


Re: Second step with SA

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 15.08.14 13:05, Timothy Murphy wrote:
>Having got SA working at last on my CentOS-7 home server,
>I'm thinking of improving its use for me (no-one else).
>It's finding about 65% of my spam, and I'd like to increase that to 80%.
>
>1) What is the simplest way to reject mail in chinese, russian
>and turkish?

adding "ok_locales en" (applies for mail using latin alphabers) and using
TextCat plugin with "ok_languages en" should help you to increase score for
such mail.
   
>2) I get some email wrongly marked spam - always from the same site.
>I'm tried marking this as ham (and running "sa-learn --ham")
>but this has surprisingly little effect.

are you training as the correct user?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
     One OS to rule them all, One OS to find them, 
One OS to bring them all and into darkness bind them 

Re: Second step with SA

Posted by Steve Bergman <sb...@gmail.com>.
On 08/15/2014 06:05 AM, Timothy Murphy wrote:

> 1) What is the simplest way to reject mail in chinese, russian
> and turkish?

Is the spam actually written in Chinese, Russian, and Turkish languages? 
Or does it come from Chinese, Russian, and Turkish domains?

The spam my users accounts receive come largely from tld's which my 
users *never* receive legitimate mail from, and are written in UTF-8 
English.

The primary violators are .ru, ,ch, .eu, and .us.

I would so much like to just block those domains. But somehow it just 
doesn't seem right.

I've applied:

ok_locales en
ok_languages en

But we still get a lot of spam from those domains, in UTF-8, in (often 
pigeon) English, which flies under the radar.

-Steve Bergman

Re: Second step with SA

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Fri, 15 Aug 2014 11:21:47 -0400
Bowie Bailey <Bo...@BUC.com> wrote:

> Considering only the spam:

> 67% Spamhaus rejections
> 33% Marked by SA

> YMMV, but it works quite well for me.

Indeed, MM does V. :)

spam=> select count(*) from incidents where status = 'spam';
 count
-------
  2391

spam=> select count(*) from incidents where status = 'spam' and incident_report ilike '%zen%';
 count
-------
   527

Admittedly, those numbers are from our very low-volume corporate mail
server.  I cannot get statistics from our much busier anti-spam cluster
because our volume is far too high to be able to use Spamhaus without
paying crazy fees.

> Zen is the only DNSBL I trust enough to use as an actual blacklist.

I agree that it's quite good.  FPs are very low.

Regards,

David.


Re: Second step with SA

Posted by Axb <ax...@gmail.com>.
On 08/15/2014 05:21 PM, Bowie Bailey wrote:
> On 8/15/2014 11:07 AM, David F. Skoll wrote:
>> On Fri, 15 Aug 2014 10:02:14 -0500
>> Steve Bergman <sb...@gmail.com> wrote:
>>
>>> So basically, elevate it to the level of an absolute blacklist.
>>> I'm not sure I trust Zen that much. I'm more a Bayes proponent than a
>>> DNSBL proponent.
>> Me too.  I'm also surprised that the OP claimed it caught 70% of his
>> spam.  I see a much lower hit-rate on Zen; it hits on about 25% of our
>> spam.
>
> I'm not the OP of the thread, but I am the one who brought up Zen.
>
> Here are the stats from my server for the last month:
>
> 53% Spamhaus rejections
> 26% Marked as spam by SA
> 21% Delivered as ham
>
> Considering only the spam:
>
> 67% Spamhaus rejections
> 33% Marked by SA
>
> YMMV, but it works quite well for me.
>
> Zen is the only DNSBL I trust enough to use as an actual blacklist.

agreed, and with Postfix I like:

reject_rhsbl_reverse_client dbl.spamhaus.org=127.0.1.2,
reject_rhsbl_client dbl.spamhaus.org=127.0.1.2,
reject_rhsbl_helo dbl.spamhaus.org=127.0.1.2


Re: Second step with SA

Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/15/2014 11:07 AM, David F. Skoll wrote:
> On Fri, 15 Aug 2014 10:02:14 -0500
> Steve Bergman <sb...@gmail.com> wrote:
>
>> So basically, elevate it to the level of an absolute blacklist.
>> I'm not sure I trust Zen that much. I'm more a Bayes proponent than a
>> DNSBL proponent.
> Me too.  I'm also surprised that the OP claimed it caught 70% of his
> spam.  I see a much lower hit-rate on Zen; it hits on about 25% of our
> spam.

I'm not the OP of the thread, but I am the one who brought up Zen.

Here are the stats from my server for the last month:

53% Spamhaus rejections
26% Marked as spam by SA
21% Delivered as ham

Considering only the spam:

67% Spamhaus rejections
33% Marked by SA

YMMV, but it works quite well for me.

Zen is the only DNSBL I trust enough to use as an actual blacklist.

-- 
Bowie

Re: Second step with SA

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Fri, 15 Aug 2014 10:02:14 -0500
Steve Bergman <sb...@gmail.com> wrote:

> So basically, elevate it to the level of an absolute blacklist.
> I'm not sure I trust Zen that much. I'm more a Bayes proponent than a 
> DNSBL proponent.

Me too.  I'm also surprised that the OP claimed it caught 70% of his
spam.  I see a much lower hit-rate on Zen; it hits on about 25% of our
spam.

Regards,

David.

Re: Second step with SA

Posted by Steve Bergman <sb...@gmail.com>.
On 08/15/2014 09:37 AM, Bowie Bailey wrote:

> Yes, it is part of the default rule set.  But what I am saying is to add
> it to your MTA as a blacklist.  That way anything matched by Zen will be
> rejected by the MTA without ever having to run SA.

So basically, elevate it to the level of an absolute blacklist.

I'm not sure I trust Zen that much. I'm more a Bayes proponent than a 
DNSBL proponent.



Re: Second step with SA

Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/15/2014 10:33 AM, Steve Bergman wrote:
> On 08/15/2014 09:14 AM, Bowie Bailey wrote:
>
>> The best way to quickly cut spam is to add the zen.spamhaus.org
>> blacklist to your MTA.
>>
>> http://www.spamhaus.org/zen/
> Is that not included in the default rule set? If not, I'm not sure where
> mine came from.

Yes, it is part of the default rule set.  But what I am saying is to add 
it to your MTA as a blacklist.  That way anything matched by Zen will be 
rejected by the MTA without ever having to run SA.

-- 
Bowie

Re: Second step with SA

Posted by Steve Bergman <sb...@gmail.com>.

On 08/15/2014 09:14 AM, Bowie Bailey wrote:

> The best way to quickly cut spam is to add the zen.spamhaus.org
> blacklist to your MTA.
>
> http://www.spamhaus.org/zen/

Is that not included in the default rule set? If not, I'm not sure where 
mine came from.

-Steve Bergman

Re: Second step with SA

Posted by Joe Quinn <jq...@pccc.com>.
On 8/15/2014 10:14 AM, Bowie Bailey wrote:
> On 8/15/2014 7:05 AM, Timothy Murphy wrote:
>> Having got SA working at last on my CentOS-7 home server,
>> I'm thinking of improving its use for me (no-one else).
>> It's finding about 65% of my spam, and I'd like to increase that to 80%.
>
> The best way to quickly cut spam is to add the zen.spamhaus.org 
> blacklist to your MTA.
>
> http://www.spamhaus.org/zen/
>
> SA will also use some of the Spamhaus blacklists for scoring, but Zen 
> is good enough to put in your MTA and simply reject anything that 
> matches.  Zen rejects about 70% of the spam that comes to my server 
> without ever getting SA involved.
>
In addition to RBLs, there's several public rulesets available outside 
the sa-update channels you can use which can be more aggressive or broad 
than SA is by default.

I am one of the maintainers for KAM.cf which you can always get the 
latest of here: http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf

There's at least one other I remember that someone on list runs, but I 
am completely blanking on the URL at the moment.

Re: Second step with SA

Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/15/2014 7:05 AM, Timothy Murphy wrote:
> Having got SA working at last on my CentOS-7 home server,
> I'm thinking of improving its use for me (no-one else).
> It's finding about 65% of my spam, and I'd like to increase that to 80%.

The best way to quickly cut spam is to add the zen.spamhaus.org 
blacklist to your MTA.

http://www.spamhaus.org/zen/

SA will also use some of the Spamhaus blacklists for scoring, but Zen is 
good enough to put in your MTA and simply reject anything that matches.  
Zen rejects about 70% of the spam that comes to my server without ever 
getting SA involved.

-- 
Bowie

Re: Second step with SA

Posted by John Hardin <jh...@impsec.org>.
On Fri, 15 Aug 2014, Timothy Murphy wrote:

> 2) I get some email wrongly marked spam - always from the same site.
>    I'm tried marking this as ham (and running "sa-learn --ham")  but
>    this has surprisingly little effect.

A few fairly standard things to consider, in case you aren't already aware 
of them:

Bayes doesn't start scoring until a sufficient amount of both spam and ham 
has been learned. Are you seeing any BAYES_* rule hits at all? If not, 
then you have not trained enough yet.

It's possible those messages were learned as spam at some time in the 
past. Do you have autolearn enabled? If not, do you keep your training 
corpora? That's recommended, so that you can do things like review it for 
misclassifications, and wipe-and-retrain-from-scratch if things really go 
off the rails.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Microsoft is not a standards body.
-----------------------------------------------------------------------
  Today: the 69th anniversary of the end of World War II