You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "David F. Skoll" <df...@roaringpenguin.com> on 2010/12/27 20:26:59 UTC

Anti-Perl rant (was Re: Issuing rollback DBI Mysql)

On Mon, 27 Dec 2010 11:16:23 -0800
Ted Mittelstaedt <te...@ipinc.net> wrote:

> Larry Wall never envisioned the octopus monstrosity that Perl has
> become.

Um.

Just because you can write overly-complex slow Perl code doesn't mean that
all Perl code is necessarily overly-complex or slow.

> Not that I am unhappy with the existence of SA but anyone who uses it
> must understand that an enormous amount of CPU power is wasted on SA
> merely due to the inefficiency of it being written in Perl.

While Perl is part of the problem, a lot of the problem is SA itself
and some of it is simply the nature of content-based anti-spam
techniques... slinging around regexes, normalizing HTML, extracting
URLs sanely, extracting Bayes tokens, etc. is going to be slow no
matter how you do it.

Regards,

David.

Re: Greylisting (was Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql))

Posted by Daniel McDonald <da...@austinenergy.com>.
On 12/27/10 4:07 PM, "David F. Skoll" <df...@roaringpenguin.com> wrote:

> On Mon, 27 Dec 2010 13:36:39 -0800
> Ted Mittelstaedt <te...@ipinc.net> wrote:
> 
>> The real question is, do you get viruses that would make it past SA?
> 
> I can't answer that because we scan for viruses before SA.  I would
> guess yes.  It would be more efficient to scan for viruses after
> scanning for spam, even though we still do it the other way around.

I scan for viruses first, (actually second, after grey-listing) because
clamav with the unofficial signatures identifies a fair amount of spam, and
the non-virus findings are added to the spamassassin score...

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281


Re: Greylisting (was Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql))

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 27 Dec 2010 13:36:39 -0800
Ted Mittelstaedt <te...@ipinc.net> wrote:

[...]

> > We do not find virus-scanning before spam-scanning to be
> > effective.  A tiny percentage of our mail is flagged as containing
> > a virus,

> That's subject to interpretation I think.  I would guess that your 
> LEGITIMATE mail is ALSO a tiny percentage of your total received
> mail. ;-)

No, not really.  Here are the statistics for 30 days' worth of mail for
messages that made it past greylisting:

- About 600 000 non-spam messages
- About 530 000 spam or suspected-spam messages
- About 65 000 messages blocked for various reasons other than
  content-filtering (on DNSBL, sender blacklisted by end-user, etc.)
- 774 viruses as detected by ClamAV

As you see, viruses make up a tiny percentage of mail volume.
Non-spam makes up about 50% of the post-greylisting volume
or about 20% of total volume including greylisting.

During that same period, about 2.4 million messages were greylisted, of
which just under 50 000 were retried correctly and made it past the
greylisting hurdle.  Greylisting remains tremendously effective.

> The real question is, do you get viruses that would make it past SA?

I can't answer that because we scan for viruses before SA.  I would
guess yes.  It would be more efficient to scan for viruses after
scanning for spam, even though we still do it the other way around.

Regards,

David.

Re: Greylisting (was Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql))

Posted by Ted Mittelstaedt <te...@ipinc.net>.
On 12/27/2010 12:42 PM, David F. Skoll wrote:
> On Mon, 27 Dec 2010 12:37:00 -0800
> Ted Mittelstaedt<te...@ipinc.net>  wrote:
>
>> greylisting, though, is by far the best.  But I have noticed an
>> increasing number of sites out there - and this is large sites - who
>> apparently are honked-off that people greylist, and they will bounce
>> delivery of mail that is issued an error 4xx in violation of the
>> standard.  Off the top of my head I seem to remember seeing this from
>> several airline company mailers that send out the advertisements to
>> their frequent flyer members, and that send out electronic ticketing
>> receipts.  Jerks!
>
> What you may be seeing is marginal SMTP client software that doesn't
> know how to handle a 4xx response to RCPT.  There was even some
> commercial software that couldn't deal with this properly (Novell
> Groupwise, I believe, though it has long since been fixed in that
> product.)
>
> BTW, this is another reason we do our greylisting post-DATA.  Although
> it's slower and uses more bandwidth, it does avoid problems with
> marginal SMTP clients and it does let us use the Subject: as part of
> the greylisting tuple, which greatly increases greylisting
> effectiveness.
>
> We do not find virus-scanning before spam-scanning to be effective.  A
> tiny percentage of our mail is flagged as containing a virus,

That's subject to interpretation I think.  I would guess that your 
LEGITIMATE mail is ALSO a tiny percentage of your total received mail. ;-)

The real question is, do you get viruses that would make it past SA?  We
do, for the simple reason that we have some users who regularly get mail
that is normally flagged as spam - and they WANT that mail - so we
list them in the exemption (all spam to) list.  The virus filtering 
makes sure that they don't get hosed down.

Of course, you can do virus scanning post-SA to capture these.

Ted

  so it
> doesn't really reduce the amount of mail that would need to be
> spam-scanned.
>
> Regards,
>
> David.


Greylisting (was Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql))

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 27 Dec 2010 12:37:00 -0800
Ted Mittelstaedt <te...@ipinc.net> wrote:

> greylisting, though, is by far the best.  But I have noticed an 
> increasing number of sites out there - and this is large sites - who
> apparently are honked-off that people greylist, and they will bounce
> delivery of mail that is issued an error 4xx in violation of the
> standard.  Off the top of my head I seem to remember seeing this from
> several airline company mailers that send out the advertisements to
> their frequent flyer members, and that send out electronic ticketing
> receipts.  Jerks!

What you may be seeing is marginal SMTP client software that doesn't
know how to handle a 4xx response to RCPT.  There was even some
commercial software that couldn't deal with this properly (Novell
Groupwise, I believe, though it has long since been fixed in that
product.)

BTW, this is another reason we do our greylisting post-DATA.  Although
it's slower and uses more bandwidth, it does avoid problems with
marginal SMTP clients and it does let us use the Subject: as part of
the greylisting tuple, which greatly increases greylisting
effectiveness.

We do not find virus-scanning before spam-scanning to be effective.  A
tiny percentage of our mail is flagged as containing a virus, so it
doesn't really reduce the amount of mail that would need to be
spam-scanned.

Regards,

David.

Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql)

Posted by Ted Mittelstaedt <te...@ipinc.net>.
On 12/27/2010 11:46 AM, Jack L. Stone wrote:
> At 02:26 PM 12.27.2010 -0500, David F. Skoll wrote:
>> On Mon, 27 Dec 2010 11:16:23 -0800
>> Ted Mittelstaedt<te...@ipinc.net>  wrote:
>>
>>> Larry Wall never envisioned the octopus monstrosity that Perl has
>>> become.
>>
>> Um.
>>
>> Just because you can write overly-complex slow Perl code doesn't mean that
>> all Perl code is necessarily overly-complex or slow.
>>
>>> Not that I am unhappy with the existence of SA but anyone who uses it
>>> must understand that an enormous amount of CPU power is wasted on SA
>>> merely due to the inefficiency of it being written in Perl.
>>
>> While Perl is part of the problem, a lot of the problem is SA itself
>> and some of it is simply the nature of content-based anti-spam
>> techniques... slinging around regexes, normalizing HTML, extracting
>> URLs sanely, extracting Bayes tokens, etc. is going to be slow no
>> matter how you do it.
>>
>> Regards,
>>
>> David.
>>
>
> In my case a very small percentage of mail actually reaches SA because of
> several filters in front of it. Sendmail, Regex-milter, Greylist-milter,
> and other milters catch most of the truly bad stuff, and then hands off
> finally to SA. Thus, my server load is not so bad now. It used to be heavy
> indeed before adding the front filters.
>

We also do clam-av.  Yes I know most virus emitters are going to be 
blacklisted and SA would catch them anyway but this gives us some
visibility as to how much of the incoming spam is actually viruses.

greylisting, though, is by far the best.  But I have noticed an 
increasing number of sites out there - and this is large sites - who
apparently are honked-off that people greylist, and they will bounce
delivery of mail that is issued an error 4xx in violation of the
standard.  Off the top of my head I seem to remember seeing this from
several airline company mailers that send out the advertisements to
their frequent flyer members, and that send out electronic ticketing
receipts.  Jerks!

Ted

Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 27 Dec 2010 13:46:34 -0600
"Jack L. Stone" <ja...@sage-american.com> wrote:

> In my case a very small percentage of mail actually reaches SA
> because of several filters in front of it. Sendmail, Regex-milter,
> Greylist-milter, and other milters catch most of the truly bad stuff,
> and then hands off finally to SA. Thus, my server load is not so bad
> now. It used to be heavy indeed before adding the front filters.

We also use greylisting and other techniques, but we do everything
in Perl from MIMEDefang (including the greylisting).

Our average processing time is about 500ms.  For messages that are
greylisted, the time is around 35ms.  That's doing post-DATA
greylisting (because we greylist on the 4-tuple {sender, recipient,
IP, Subject}) implemented in Perl with the greylist data in
PostgreSQL.  So Perl does not *have* to be slow.

Regards,

David.

Re: Anti-Perl rant (was Re: Issuing rollback DBI Mysql)

Posted by "Jack L. Stone" <ja...@sage-american.com>.
At 02:26 PM 12.27.2010 -0500, David F. Skoll wrote:
>On Mon, 27 Dec 2010 11:16:23 -0800
>Ted Mittelstaedt <te...@ipinc.net> wrote:
>
>> Larry Wall never envisioned the octopus monstrosity that Perl has
>> become.
>
>Um.
>
>Just because you can write overly-complex slow Perl code doesn't mean that
>all Perl code is necessarily overly-complex or slow.
>
>> Not that I am unhappy with the existence of SA but anyone who uses it
>> must understand that an enormous amount of CPU power is wasted on SA
>> merely due to the inefficiency of it being written in Perl.
>
>While Perl is part of the problem, a lot of the problem is SA itself
>and some of it is simply the nature of content-based anti-spam
>techniques... slinging around regexes, normalizing HTML, extracting
>URLs sanely, extracting Bayes tokens, etc. is going to be slow no
>matter how you do it.
>
>Regards,
>
>David.
>

In my case a very small percentage of mail actually reaches SA because of
several filters in front of it. Sendmail, Regex-milter, Greylist-milter,
and other milters catch most of the truly bad stuff, and then hands off
finally to SA. Thus, my server load is not so bad now. It used to be heavy
indeed before adding the front filters.

Jack


(^_^)
Happy trails,
Jack L. Stone

System Admin
Sage-american