You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Kevin A. McGrail" <KM...@PCCC.com> on 2015/03/17 22:16:48 UTC

Recommendations for ASF SA Implementation

Hello All,

I am working on recommendations for the ASF to modernize the 
installation of SA for the foundation.

We have some givens:

Using Ubuntu
Using Postfix
Need to stick with maintainable packages
Likely needs to stay away from lots of tweaks and heavy customization 
such as using MIMEDefang (unfortunate).

So I'd like any input you might have, on or off list.  Here's some 
questions I believe will help guide things:

Q1 - What is the best glue for SA for Postfix that does the following:

- uses spamc calls so that spamd's can be distributed and load balanced?
- can implement clamav before SA call
- should silently discard emails if a virus is detected
- must use clamdscan but ideally can utilize some sort of socket 
solution for clamd to run distributed and load balanced
- should bound email over a certain threshold (let's say 5) and silently 
discard email over a certain threshold for SA (let's say 10)
- Might use a few RBLs to decline connections to start
- Implements a good implementation of greylisting
- Temporary failure for scanning (virus or spam) failures


Q2 - Do we happen to know who maintains SA for Ubuntu so we can try and 
work to make sure the upcoming release of 3.4.1 is packaged?


Here's the high level draft if anyone has some thoughts:

- Implement a cluster of spamd servers with no Bayes but likely using 
SQL prefs for some whitelist/blacklisting - Bayes not being used because 
training and maintaining will likely be too difficult
- Implement txrep with SQL backend
- Implement a cluster of clamav boxes
- Implement an SPF record
- Implement postfix with xyz glue to test email on a scalable # of mx's
- Implement a few RBLs to block SMTP connections - I hate to recommend 
this but ASF members are very sensitive to spam so I'm treading lightly


Regards,
KAM

Re: Recommendations for ASF SA Implementation

Posted by Reindl Harald <h....@thelounge.net>.
please stay on list

Am 18.03.2015 um 10:46 schrieb Anthony Cartmell:
>> no, we have per day 300 SA rejects and had 20 clamav hits before change
>> the order, now the SA reject-count is not much different and only 5
>> clamav hits per day
>
> I was just reporting that MailScanner had changed its order of scanning
> following the introduction of third-party ClamAV signatures.
>
> A potential benefit of running SA second is to allow scoring of the
> ClamAV signature matches so that you can fine-tune how much effect each
> group of signatures have

correct - but what you mostly want to achieve on a server with 
noticeable load is reject as soon as possible and skip as much as 
possible restrictions and scanners

wrap them that way would double the load and the potential benefit needs 
to be really careful considered given that in case of malware you want 
to reject in any case and that SA runs *all* tests with high costs

most of our SA rejects are coming with a score above 15 while reject 
starting with 8.0 and what i also have in mind is how to weight such 
decisions in case of a message has BAYES_00 but contains malware - who 
is right: the clamav signature or the BAYES_00 - i would say the 
signature (yes, with a FP risk you have anyways)

the initial post was as far as i understood it about the complete 
infrastructure of a inbound MX hence

* postscreen RBL scoring
* postscreen protocol checks
* envelope restricitions
* SPF backed with DNSWL saftey nets
* PTR restricitons with more DNSWL saftey nets
* HELo restricitons with more DNSWL saftey nets
* sender verify for senders not on any DNSWL and no SPF
* expensive contentscanner with most reject hits
* expensive contenscanner with less reject hits
* most expensive contentscanner with only a few reject hits

the point is that you can handle much more load without clustering and 
even if your load still is that high to need clustering it makes a 
difference in how many cluster nodes you need at the end


Re: Recommendations for ASF SA Implementation

Posted by Reindl Harald <h....@thelounge.net>.
Am 18.03.2015 um 10:30 schrieb Anthony Cartmell:
>> reverse the order in "smtpd_milters" but keep in mind that a well
>> trained SA rejetcs much more mails than clamav and while clamav needs
>> less ressources you by-pass the whole virus canner that way
>
> MailScanner used to scan in that order too, SA then AV.
>
> However with the introduction of third-party ClamAV signature databases
> that match with things other than malware, the order was changed. Now
> the initial scanning is now done with clamd (with third party signatures
> such as those collected by SaneSecurity[1]) first, and then SA second.
> This allows SA to score messages based on report headers added by the
> ClamAV virus(/spam/scam/phishing) scanner, making a very flexible tool.
>
> [1] http://sanesecurity.com/usage/signatures/

no, we have per day 300 SA rejects and had 20 clamav hits before change 
the order, now the SA reject-count is not much different and only 5 
clamav hits per day

for me that means SA takes 15 out of the 20 malware mails and 275 
messages previously gone through both milters are now rejected by the first

/bin/ls -1 /var/lib/clamav/
blurl.ndb
bofhland_cracked_URL.ndb
bofhland_malware_attach.hdb
bofhland_malware_URL.ndb
bofhland_phishing_URL.ndb
bytecode.cvd
crdfam.clamav.hdb
daily.cld
foxhole_all.cdb
foxhole_filename.cdb
foxhole_generic.cdb
junk.ndb
jurlbla.ndb
jurlbl.ndb
lott.ndb
main.cvd
malwarehash.hsb
mirrors.dat
phish.ndb
phishtank.ndb
rogue.hdb
sanesecurity.ftm
scamnailer.ndb
scam.ndb
sigwhitelist.ign2
spamattach.hdb
spamimg.hdb
spam.ldb
spearl.ndb
spear.ndb
winnow.attachments.hdb
winnow_bad_cw.hdb
winnow_extended_malware.hdb
winnow_malware.hdb
winnow_malware_links.ndb
winnow_phish_complete_url.ndb
winnow_spam_complete.ndb


Re: Recommendations for ASF SA Implementation

Posted by Reindl Harald <h....@thelounge.net>.

Am 17.03.2015 um 22:16 schrieb Kevin A. McGrail:
> So I'd like any input you might have, on or off list.  Here's some
> questions I believe will help guide things:
>
> Q1 - What is the best glue for SA for Postfix that does the following:
> - can implement clamav before SA call

postfix does that out-of-the-box

reverse the order in "smtpd_milters" but keep in mind that a well 
trained SA rejetcs much more mails than clamav and while clamav needs 
less ressources you by-pass the whole virus canner that way

smtpd_milters = unix:/run/spamass-milter/spamass-milter.sock, 
unix:/run/clamav-milter/clamav-milter.socket

> - should silently discard emails if a virus is detected

a MTA/MX must never silent discard mails
where i live you go in jail for that as sysadmin

reject at SMTP level or deliver it

> - Might use a few RBLs to decline connections to start

any recent postfix has postcreen on board with a sensible BL/WL scoring 
long before the smtpd process - contentfilters don't face 90-95% of all 
mails that way

> - Implements a good implementation of greylisting

should also happen on the MTA level if at all

a backup-mx always anwering with a 4xx code also kills 50% of all botnet 
ips never seen on the primary MX but without the negative impacts (delay 
mail, loops in case of large senders alsways coming from a differnet IP 
and so never make it through greylisting)

> - Temporary failure for scanning (virus or spam) failures

is a postfix standard behavior if a milter don't respond and also the 
standard behavior of most milters if they can't reach the final daemon


Re: Recommendations for ASF SA Implementation

Posted by Axb <ax...@gmail.com>.
On 03/17/2015 10:16 PM, Kevin A. McGrail wrote:
> Hello All,
>
> I am working on recommendations for the ASF to modernize the
> installation of SA for the foundation.
>
> We have some givens:
>
> Using Ubuntu
> Using Postfix
> Need to stick with maintainable packages
> Likely needs to stay away from lots of tweaks and heavy customization
> such as using MIMEDefang (unfortunate).

Although I'd suggest Fuglu, the obvious choice should probably be 
amavisd-new considering Mark is also highly involved in SA dev work.

It's also distributed by Ubuntu so it would be one package less to 
mantain outside the distro. We'd get the best of both worlds.

Axb


Re: Recommendations for ASF SA Implementation

Posted by Mark Martinec <Ma...@ijs.si>.
2015-03-17 22:16, Kevin A. McGrail wrote:

> I am working on recommendations for the ASF to modernize the
> installation of SA for the foundation.
> 
> We have some givens:
> 
> Using Ubuntu
> Using Postfix
> Need to stick with maintainable packages
> Likely needs to stay away from lots of tweaks and heavy customization
> such as using MIMEDefang (unfortunate).
> 
> So I'd like any input you might have, on or off list.


Axb wrote:
| Although I'd suggest Fuglu, the obvious choice should probably be 
amavisd-new
| considering Mark is also highly involved in SA dev work.
| It's also distributed by Ubuntu so it would be one package less to 
maintain
| outside the distro. We'd get the best of both worlds.
| Axb

Thanks, Amavis would be my choice too :)))


back to Kevin:
> Here's some questions I believe will help guide things:
> 
> Q1 - What is the best glue for SA for Postfix that does the following:
> 
> - uses spamc calls so that spamd's can be distributed and load 
> balanced?

Amavis uses a standard protocol SMTP for communication with an MTA
instead of the proprietary spamc/spamd protocol. Other than that,
interfacing to the SpamAssassin is pretty much the same as in spamd,
i.e. uses pre-forked set of processes which use the SpamAssassin 
library.
For this reason the performance is pretty much the same - the bottleneck
is processing rules in the SpamAssassin.

> can be distributed and load balanced?

Yes, can be distributed and load balanced. Two approaches are most
apparent:
- the classical approach is to run multiple postfix+amavis
combos on several hosts, and let MX dns record distribute the load
across them. If a single IP address is desired, an SMTP proxy (such
as nginx) can do the task of load sharing in front of Postfix.
- if a single MTA is preferred with multiple content filters on
multiple hosts, then traffic from Postfix to amavisd instances
can be spread using HAProxy (or some other load balancer).

Note that it is beneficial to feed outgoing mail through amavis too
for the following reasons:
- the PenPals feature keeps track of ongoing conversations and
contributes negative score points to such, preventing some false
positives on marginal mail content (a requirement is a common
database for all amavis instances, preferably redis, possibly SQL);
- when SpamAssassin autolearning is enabled, outgoing mail
contributes its valuable share of ham samples;
- when an internal machine or a user mail account gets compromised
and starts spewing malware or spam, it will get blocked and detected.
- not to forget: to DKIM-sign outbound mail it needs to pass
through a signer. Amavisd can do DKIM signing (and verification).


> - can implement clamav before SA call

Yes.

Also, considering that some of the third-party ClamAV rulesets
are prone to false positives, or intentionally target spam (not
viruses and other malware), amavis can be configured to reclassify
certain malware (by name) as spam, contributing to SpamAssassin score
and not blocking as malware right away.


> - should silently discard emails if a virus is detected

Configurable, but you don't want to do that, and (as Reindl Harald
noted) may even be violating law. Unwanted mail must be rejected
at an SMTP level (or delivered to a dedicated folder or quarantine),
it must not be lost. Amavis is nowadays typically deployed as a
before-queue Postfix content filter so that it can reject mail
while the original session is still open.

Keep in mind that antivirus software does occasionally produce
false positives, ClamAV with third party rules even more so.
A legitimated sender must be notified is this happens.


> - must use clamdscan but ideally can utilize some sort of socket
> solution for clamd to run distributed and load balanced

Can do.  Amavisd cam interface with clamd either through
clamdscan, or (preferably) by directly talking to it over
the clamd protocol (thus eliminating clamdscan from the setup).
As this is a normal TCP connection, it can be load balanced
using HAProxy, although it probably makes more sense to keep
amavis+clamd pairs on each host.


> - should bound email over a certain threshold (let's say 5) and
> silently discard email over a certain threshold for SA (let's say 10)

Possible. There are a couple of configurable spam score levels,
each with its configurable action:

   tag level  - adds X-Spam-* headers (ham or spam)
   tag2 level - adds X-Spam-* headers, claims it is spam
   tag3 level - adds X-Spam-* headers, claims it is blatant spam
   kill level - (typically) rejects mail (or can discard or deliver)

Quarantining at each spam level is configurable independently.


> - Might use a few RBLs to decline connections to start

Yes. That belongs to Postfix.


> - Implements a good implementation of greylisting

That belongs to Postfix.
I tend to shy away from greylisting, it is much less effective
as it used to be initially. In my opinion it does more harm than good.


> - Temporary failure for scanning (virus or spam) failures

Yes. Any fatal/unrecoverable failure causes a SMTP temporary failure
(4xx response either from amavis or from an MTA). No mail can get lost.


> Q2 - Do we happen to know who maintains SA for Ubuntu so we can try
> and work to make sure the upcoming release of 3.4.1 is packaged?

No idea. I thought the ASF infrastructure runs on FreeBSD mostly.


> Here's the high level draft if anyone has some thoughts:
> 
> - Implement a cluster of spamd servers with no Bayes but likely using
> SQL prefs for some whitelist/blacklisting - Bayes not being used
> because training and maintaining will likely be too difficult

I find bayes with autolearning very valuable (using redis backend,
mostly maintenance-free). Probably not so good at some general public
mail provider, but certainly good for a scope of users sharing mostly
technically oriented / common interests mail.

> - Implement txrep with SQL backend

Haven't tried txrep yet.

> - Implement a cluster of clamav boxes

ClamAV is usually faster that SpamAssassin. I'd keep several instances
of amavis+SpamAssassin+clamd (with or without a Postfix instance)
on multiple hosts if the load is really that high.

> - Implement an SPF record

Yes, an unfortunate fact of life.

Not to forget, DKIM signing is essential, must be done *after*
mailing list fanout.

> - Implement postfix with xyz glue to test email on a scalable # of mx's

Sure.

> - Implement a few RBLs to block SMTP connections - I hate to recommend
> this but ASF members are very sensitive to spam so I'm treading
> lightly

Some high-quality RBLs at an MTA level are desired.
Postfix even implements weighting with a threshold
over multiple RBLs if desired.


For a high-level view on Amavis see the Wikipedia article:

   http://en.wikipedia.org/wiki/Amavis


Perhaps I should point out some more features that I find valuable:
- amavis can block mail based on declared MIME content type or MIME 
name,
   or based on a MIME part's content as classified by a file(1) utility.
   This helps with first waves of malware before virus scanners get their
   signatures updated, e.g. block MS executables;
- produces detailed logging in JSON (in addition to syslog). JSON 
logging
   can be valuable for effectively feeding into 
Elasticsearch/Logstash/Kibana
   or into Splunk or other log analyzers;
- large mail (over the SpamAssassin's limit) is not just blindly passed,
   but a truncated section of mail is passed to SpamAssassin for 
evaluation,
   with DKIM signature checks already done on the full pristine mail 
content,
   so that truncation does not invalidate signatures, yet in many cases
   SpamAssassin can still do its job reasonably well.


Mark