You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Kevin A. McGrail" <KM...@PCCC.com> on 2015/03/17 22:16:48 UTC
Recommendations for ASF SA Implementation
Hello All,
I am working on recommendations for the ASF to modernize the
installation of SA for the foundation.
We have some givens:
Using Ubuntu
Using Postfix
Need to stick with maintainable packages
Likely needs to stay away from lots of tweaks and heavy customization
such as using MIMEDefang (unfortunate).
So I'd like any input you might have, on or off list. Here's some
questions I believe will help guide things:
Q1 - What is the best glue for SA for Postfix that does the following:
- uses spamc calls so that spamd's can be distributed and load balanced?
- can implement clamav before SA call
- should silently discard emails if a virus is detected
- must use clamdscan but ideally can utilize some sort of socket
solution for clamd to run distributed and load balanced
- should bound email over a certain threshold (let's say 5) and silently
discard email over a certain threshold for SA (let's say 10)
- Might use a few RBLs to decline connections to start
- Implements a good implementation of greylisting
- Temporary failure for scanning (virus or spam) failures
Q2 - Do we happen to know who maintains SA for Ubuntu so we can try and
work to make sure the upcoming release of 3.4.1 is packaged?
Here's the high level draft if anyone has some thoughts:
- Implement a cluster of spamd servers with no Bayes but likely using
SQL prefs for some whitelist/blacklisting - Bayes not being used because
training and maintaining will likely be too difficult
- Implement txrep with SQL backend
- Implement a cluster of clamav boxes
- Implement an SPF record
- Implement postfix with xyz glue to test email on a scalable # of mx's
- Implement a few RBLs to block SMTP connections - I hate to recommend
this but ASF members are very sensitive to spam so I'm treading lightly
Regards,
KAM
Re: Recommendations for ASF SA Implementation
Posted by Reindl Harald <h....@thelounge.net>.
please stay on list
Am 18.03.2015 um 10:46 schrieb Anthony Cartmell:
>> no, we have per day 300 SA rejects and had 20 clamav hits before change
>> the order, now the SA reject-count is not much different and only 5
>> clamav hits per day
>
> I was just reporting that MailScanner had changed its order of scanning
> following the introduction of third-party ClamAV signatures.
>
> A potential benefit of running SA second is to allow scoring of the
> ClamAV signature matches so that you can fine-tune how much effect each
> group of signatures have
correct - but what you mostly want to achieve on a server with
noticeable load is reject as soon as possible and skip as much as
possible restrictions and scanners
wrap them that way would double the load and the potential benefit needs
to be really careful considered given that in case of malware you want
to reject in any case and that SA runs *all* tests with high costs
most of our SA rejects are coming with a score above 15 while reject
starting with 8.0 and what i also have in mind is how to weight such
decisions in case of a message has BAYES_00 but contains malware - who
is right: the clamav signature or the BAYES_00 - i would say the
signature (yes, with a FP risk you have anyways)
the initial post was as far as i understood it about the complete
infrastructure of a inbound MX hence
* postscreen RBL scoring
* postscreen protocol checks
* envelope restricitions
* SPF backed with DNSWL saftey nets
* PTR restricitons with more DNSWL saftey nets
* HELo restricitons with more DNSWL saftey nets
* sender verify for senders not on any DNSWL and no SPF
* expensive contentscanner with most reject hits
* expensive contenscanner with less reject hits
* most expensive contentscanner with only a few reject hits
the point is that you can handle much more load without clustering and
even if your load still is that high to need clustering it makes a
difference in how many cluster nodes you need at the end
Re: Recommendations for ASF SA Implementation
Posted by Reindl Harald <h....@thelounge.net>.
Am 18.03.2015 um 10:30 schrieb Anthony Cartmell:
>> reverse the order in "smtpd_milters" but keep in mind that a well
>> trained SA rejetcs much more mails than clamav and while clamav needs
>> less ressources you by-pass the whole virus canner that way
>
> MailScanner used to scan in that order too, SA then AV.
>
> However with the introduction of third-party ClamAV signature databases
> that match with things other than malware, the order was changed. Now
> the initial scanning is now done with clamd (with third party signatures
> such as those collected by SaneSecurity[1]) first, and then SA second.
> This allows SA to score messages based on report headers added by the
> ClamAV virus(/spam/scam/phishing) scanner, making a very flexible tool.
>
> [1] http://sanesecurity.com/usage/signatures/
no, we have per day 300 SA rejects and had 20 clamav hits before change
the order, now the SA reject-count is not much different and only 5
clamav hits per day
for me that means SA takes 15 out of the 20 malware mails and 275
messages previously gone through both milters are now rejected by the first
/bin/ls -1 /var/lib/clamav/
blurl.ndb
bofhland_cracked_URL.ndb
bofhland_malware_attach.hdb
bofhland_malware_URL.ndb
bofhland_phishing_URL.ndb
bytecode.cvd
crdfam.clamav.hdb
daily.cld
foxhole_all.cdb
foxhole_filename.cdb
foxhole_generic.cdb
junk.ndb
jurlbla.ndb
jurlbl.ndb
lott.ndb
main.cvd
malwarehash.hsb
mirrors.dat
phish.ndb
phishtank.ndb
rogue.hdb
sanesecurity.ftm
scamnailer.ndb
scam.ndb
sigwhitelist.ign2
spamattach.hdb
spamimg.hdb
spam.ldb
spearl.ndb
spear.ndb
winnow.attachments.hdb
winnow_bad_cw.hdb
winnow_extended_malware.hdb
winnow_malware.hdb
winnow_malware_links.ndb
winnow_phish_complete_url.ndb
winnow_spam_complete.ndb
Re: Recommendations for ASF SA Implementation
Posted by Reindl Harald <h....@thelounge.net>.
Am 17.03.2015 um 22:16 schrieb Kevin A. McGrail:
> So I'd like any input you might have, on or off list. Here's some
> questions I believe will help guide things:
>
> Q1 - What is the best glue for SA for Postfix that does the following:
> - can implement clamav before SA call
postfix does that out-of-the-box
reverse the order in "smtpd_milters" but keep in mind that a well
trained SA rejetcs much more mails than clamav and while clamav needs
less ressources you by-pass the whole virus canner that way
smtpd_milters = unix:/run/spamass-milter/spamass-milter.sock,
unix:/run/clamav-milter/clamav-milter.socket
> - should silently discard emails if a virus is detected
a MTA/MX must never silent discard mails
where i live you go in jail for that as sysadmin
reject at SMTP level or deliver it
> - Might use a few RBLs to decline connections to start
any recent postfix has postcreen on board with a sensible BL/WL scoring
long before the smtpd process - contentfilters don't face 90-95% of all
mails that way
> - Implements a good implementation of greylisting
should also happen on the MTA level if at all
a backup-mx always anwering with a 4xx code also kills 50% of all botnet
ips never seen on the primary MX but without the negative impacts (delay
mail, loops in case of large senders alsways coming from a differnet IP
and so never make it through greylisting)
> - Temporary failure for scanning (virus or spam) failures
is a postfix standard behavior if a milter don't respond and also the
standard behavior of most milters if they can't reach the final daemon
Re: Recommendations for ASF SA Implementation
Posted by Axb <ax...@gmail.com>.
On 03/17/2015 10:16 PM, Kevin A. McGrail wrote:
> Hello All,
>
> I am working on recommendations for the ASF to modernize the
> installation of SA for the foundation.
>
> We have some givens:
>
> Using Ubuntu
> Using Postfix
> Need to stick with maintainable packages
> Likely needs to stay away from lots of tweaks and heavy customization
> such as using MIMEDefang (unfortunate).
Although I'd suggest Fuglu, the obvious choice should probably be
amavisd-new considering Mark is also highly involved in SA dev work.
It's also distributed by Ubuntu so it would be one package less to
mantain outside the distro. We'd get the best of both worlds.
Axb
Re: Recommendations for ASF SA Implementation
Posted by Mark Martinec <Ma...@ijs.si>.
2015-03-17 22:16, Kevin A. McGrail wrote:
> I am working on recommendations for the ASF to modernize the
> installation of SA for the foundation.
>
> We have some givens:
>
> Using Ubuntu
> Using Postfix
> Need to stick with maintainable packages
> Likely needs to stay away from lots of tweaks and heavy customization
> such as using MIMEDefang (unfortunate).
>
> So I'd like any input you might have, on or off list.
Axb wrote:
| Although I'd suggest Fuglu, the obvious choice should probably be
amavisd-new
| considering Mark is also highly involved in SA dev work.
| It's also distributed by Ubuntu so it would be one package less to
maintain
| outside the distro. We'd get the best of both worlds.
| Axb
Thanks, Amavis would be my choice too :)))
back to Kevin:
> Here's some questions I believe will help guide things:
>
> Q1 - What is the best glue for SA for Postfix that does the following:
>
> - uses spamc calls so that spamd's can be distributed and load
> balanced?
Amavis uses a standard protocol SMTP for communication with an MTA
instead of the proprietary spamc/spamd protocol. Other than that,
interfacing to the SpamAssassin is pretty much the same as in spamd,
i.e. uses pre-forked set of processes which use the SpamAssassin
library.
For this reason the performance is pretty much the same - the bottleneck
is processing rules in the SpamAssassin.
> can be distributed and load balanced?
Yes, can be distributed and load balanced. Two approaches are most
apparent:
- the classical approach is to run multiple postfix+amavis
combos on several hosts, and let MX dns record distribute the load
across them. If a single IP address is desired, an SMTP proxy (such
as nginx) can do the task of load sharing in front of Postfix.
- if a single MTA is preferred with multiple content filters on
multiple hosts, then traffic from Postfix to amavisd instances
can be spread using HAProxy (or some other load balancer).
Note that it is beneficial to feed outgoing mail through amavis too
for the following reasons:
- the PenPals feature keeps track of ongoing conversations and
contributes negative score points to such, preventing some false
positives on marginal mail content (a requirement is a common
database for all amavis instances, preferably redis, possibly SQL);
- when SpamAssassin autolearning is enabled, outgoing mail
contributes its valuable share of ham samples;
- when an internal machine or a user mail account gets compromised
and starts spewing malware or spam, it will get blocked and detected.
- not to forget: to DKIM-sign outbound mail it needs to pass
through a signer. Amavisd can do DKIM signing (and verification).
> - can implement clamav before SA call
Yes.
Also, considering that some of the third-party ClamAV rulesets
are prone to false positives, or intentionally target spam (not
viruses and other malware), amavis can be configured to reclassify
certain malware (by name) as spam, contributing to SpamAssassin score
and not blocking as malware right away.
> - should silently discard emails if a virus is detected
Configurable, but you don't want to do that, and (as Reindl Harald
noted) may even be violating law. Unwanted mail must be rejected
at an SMTP level (or delivered to a dedicated folder or quarantine),
it must not be lost. Amavis is nowadays typically deployed as a
before-queue Postfix content filter so that it can reject mail
while the original session is still open.
Keep in mind that antivirus software does occasionally produce
false positives, ClamAV with third party rules even more so.
A legitimated sender must be notified is this happens.
> - must use clamdscan but ideally can utilize some sort of socket
> solution for clamd to run distributed and load balanced
Can do. Amavisd cam interface with clamd either through
clamdscan, or (preferably) by directly talking to it over
the clamd protocol (thus eliminating clamdscan from the setup).
As this is a normal TCP connection, it can be load balanced
using HAProxy, although it probably makes more sense to keep
amavis+clamd pairs on each host.
> - should bound email over a certain threshold (let's say 5) and
> silently discard email over a certain threshold for SA (let's say 10)
Possible. There are a couple of configurable spam score levels,
each with its configurable action:
tag level - adds X-Spam-* headers (ham or spam)
tag2 level - adds X-Spam-* headers, claims it is spam
tag3 level - adds X-Spam-* headers, claims it is blatant spam
kill level - (typically) rejects mail (or can discard or deliver)
Quarantining at each spam level is configurable independently.
> - Might use a few RBLs to decline connections to start
Yes. That belongs to Postfix.
> - Implements a good implementation of greylisting
That belongs to Postfix.
I tend to shy away from greylisting, it is much less effective
as it used to be initially. In my opinion it does more harm than good.
> - Temporary failure for scanning (virus or spam) failures
Yes. Any fatal/unrecoverable failure causes a SMTP temporary failure
(4xx response either from amavis or from an MTA). No mail can get lost.
> Q2 - Do we happen to know who maintains SA for Ubuntu so we can try
> and work to make sure the upcoming release of 3.4.1 is packaged?
No idea. I thought the ASF infrastructure runs on FreeBSD mostly.
> Here's the high level draft if anyone has some thoughts:
>
> - Implement a cluster of spamd servers with no Bayes but likely using
> SQL prefs for some whitelist/blacklisting - Bayes not being used
> because training and maintaining will likely be too difficult
I find bayes with autolearning very valuable (using redis backend,
mostly maintenance-free). Probably not so good at some general public
mail provider, but certainly good for a scope of users sharing mostly
technically oriented / common interests mail.
> - Implement txrep with SQL backend
Haven't tried txrep yet.
> - Implement a cluster of clamav boxes
ClamAV is usually faster that SpamAssassin. I'd keep several instances
of amavis+SpamAssassin+clamd (with or without a Postfix instance)
on multiple hosts if the load is really that high.
> - Implement an SPF record
Yes, an unfortunate fact of life.
Not to forget, DKIM signing is essential, must be done *after*
mailing list fanout.
> - Implement postfix with xyz glue to test email on a scalable # of mx's
Sure.
> - Implement a few RBLs to block SMTP connections - I hate to recommend
> this but ASF members are very sensitive to spam so I'm treading
> lightly
Some high-quality RBLs at an MTA level are desired.
Postfix even implements weighting with a threshold
over multiple RBLs if desired.
For a high-level view on Amavis see the Wikipedia article:
http://en.wikipedia.org/wiki/Amavis
Perhaps I should point out some more features that I find valuable:
- amavis can block mail based on declared MIME content type or MIME
name,
or based on a MIME part's content as classified by a file(1) utility.
This helps with first waves of malware before virus scanners get their
signatures updated, e.g. block MS executables;
- produces detailed logging in JSON (in addition to syslog). JSON
logging
can be valuable for effectively feeding into
Elasticsearch/Logstash/Kibana
or into Splunk or other log analyzers;
- large mail (over the SpamAssassin's limit) is not just blindly passed,
but a truncated section of mail is passed to SpamAssassin for
evaluation,
with DKIM signature checks already done on the full pristine mail
content,
so that truncation does not invalidate signatures, yet in many cases
SpamAssassin can still do its job reasonably well.
Mark