You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joseph Acquisto <jo...@j4computers.com> on 2012/12/03 12:25:50 UTC

Message not scanned- Size?

A message slipped through untouced.  Obvious spam from "Minister of Finance" with many attachments.

/var/log/mail shows a message skipped "spamc[7262]: skipped message, greater than max message
size (512000 bytes)" at the time this came thru.

I'd guess that was it.  Unusual, but any way to prevent that in future?

joe a.



Re: Message not scanned- Size?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 12/3/2012 3:03 PM, Henrik K wrote:

>> Two, I'm trying a system that also truncates messages mid-message at
>> the threshold to scan them anyway.
>>
>> The second idea has been pretty controversial but I think the first
>> one is a neat idea.
> Why is it controversial? Amavisd-new 2.6.3 had this feature since 2009, so
> it's used probably very widely - even without users knowing it.  I've never
> seen any ill effects.
There have been emails on list in the past few weeks. I believe AXB is 
anti the concept.  I have no opinion yet which is why I am testing.

>> This might be something neat to add to spamd/spamc natively where
>> there is a load average multiplier concept. Thoughts?
> Load average is quite silly measurement. The maximum depends on the number
> of cpu cores/threads.  Also this kind of limiting makes sense only on a
> pre-queue filter in which case such "throttles" should be applied way before
> anything gets to SA.

I'm simply trying to maximize resources and right now, the reason SA 
doesn't scan infinitely is because large messages take more time to 
process.  Using Load Average, which I agree is a bit amorphous, is a 
quick gauge of resources that are available and allows me to 
automatically raise and lower the size limit for spamc/d in a way that 
is not controversial such as the truncation method.

Regards,
KAM


Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Tue, 04 Dec 2012 12:41:55 +0000
Martin Gregorie <ma...@gregorie.org> wrote:

> Is there an ANSI C MIME encode/decode library that could be bolted
> onto spamc?

ripmime: http://www.pldaniels.com/ripmime/ and it's liberally licensed
(BSD revised).  I think it's just a decoder, not an encoder, but
generating MIME is much easier than parsing it.

However, it occurs to me it'd be better to build a proper MIME
parsing-and-munging Perl library into SpamAssassin so everyone could
use it, not just people who use it via spamc.

Regards,

David.

Re: Message not scanned- Size?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2012-12-04 at 07:02 -0500, David F. Skoll wrote:
> On Tue, 04 Dec 2012 11:12:54 +0100
> "Andrzej A. Filip" <an...@gmail.com> wrote:
> 
> > Have you tried/considered scoring based on "headers only"?
> 
Does anybody have statistics on the type and number of components in
messages that exceed the scan size limit? What about information on how
the various components contribute to the score?

It occurs to me that, if we knew these stats, it could be fairly simple
for spamc to selectively remove parts that don't contribute to the
score, retain a fragment of some that do, e.g. all you need from an
image are the MIME headers (because we have useful rules that compare
the content type with the file name) and, possibly, the image's header
bytes (for rules that compare the file name with its content). Spamc
would then send the shortened message to spamd for scanning, receive the
SA headers back, and insert them in the original message (which it must
retain) before passing it on.

Is there an ANSI C MIME encode/decode library that could be bolted onto
spamc? I can't find one, though there are a number of OSS C++ libraries,
so spamc might need rewriting to use them.


Martin




Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Tue, 04 Dec 2012 11:12:54 +0100
"Andrzej A. Filip" <an...@gmail.com> wrote:

> Have you tried/considered scoring based on "headers only"?

No, and I'm not sure that would be very effective.  Our Bayes DB
is extremely effective against message bodies.

Regards,

David.

Re: Message not scanned- Size?

Posted by "Andrzej A. Filip" <an...@gmail.com>.
On 12/03/2012 09:43 PM, David F. Skoll wrote:
> On Mon, 3 Dec 2012 22:03:25 +0200
> Henrik K <he...@hege.li> wrote:
>
>> On Mon, Dec 03, 2012 at 01:54:44PM -0500, Kevin A. McGrail wrote:
> [Test loadavg in filtering decisions]
>
>> Seems kind of pointless. Have you actually measured how larger
>> messages affect cpu usage?  Especially since usually there are much
>> less messages the larger they get.
> I agree.  LoadAVG is a pretty useless measurement.  And relaxing your
> filtering based on load gives spammers a clear signal how to defeat
> your filter.
>
>>> Two, I'm trying a system that also truncates messages mid-message at
>>> the threshold to scan them anyway.
>> Why is it controversial? Amavisd-new 2.6.3 had this feature since
>> 2009, so it's used probably very widely - even without users knowing
>> it.  I've never seen any ill effects.
> We truncate overly-long messages too, but we try to be intelligent
> about it.  We shrink non-text MIME parts first and then if the message
> is still too large, we give up.  Just blindly cutting a message in the
> middle might wreck the MIME structure and give unexpected and unwanted
> results.

Have you tried/considered scoring based on "headers only"?
I think many MTA implements limit on total headers' size.

Also "extra speed" in reporting over-sized/big spam messages had seemed
to deter "big size" spammers.
[based on experience with my personal mailboxes only and reports to
spamcop.net of spams >40KB]

Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 03 Dec 2012 15:51:45 -0500
"Kevin A. McGrail" <KM...@PCCC.com> wrote:

> My goal is to implement something in spamc/spamd that's useful for 
> people using SA more out of the box.  I guess, my thought is that
> adding some logic to dynamically increase the size limit was better
> than the status quo.

OK.

> I agree your techniques are good and perhaps it's time to make 
> MIME::Tools more integral to SA.

Well. :)  I'm not sure about that... MIME::Tools can be pretty
memory-hungry and I assume SA already has a way to parse MIME
and build an in-memory representation of the MIME message.  As long
as you can manipulate that, you can truncated messages intelligently.

Regards,

David.

Re: Message not scanned- Size?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 12/3/2012 3:43 PM, David F. Skoll wrote:
> I agree.  LoadAVG is a pretty useless measurement.  And relaxing your
> filtering based on load gives spammers a clear signal how to defeat
> your filter.
...
> We truncate overly-long messages too, but we try to be intelligent 
> about it. We shrink non-text MIME parts first and then if the message 
> is still too large, we give up. Just blindly cutting a message in the 
> middle might wreck the MIME structure and give unexpected and unwanted 
> results. 
My goal is to implement something in spamc/spamd that's useful for 
people using SA more out of the box.  I guess, my thought is that adding 
some logic to dynamically increase the size limit was better than the 
status quo.

I agree your techniques are good and perhaps it's time to make 
MIME::Tools more integral to SA.

Perhaps I'm focusing too much on "better than now" and you're both 
right, we need to focus on "the best".

Regards,
KAM

Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 03 Dec 2012 16:20:30 -0500
Bowie Bailey <Bo...@BUC.com> wrote:

> Without this setup, you are always at the "lower security" level.

Ah, so you believe the glass is half-full whereas I maintain it's
half-empty. :)

> Of course, everyone would like to have a box that can handle fully
> scanning every email that comes in, but for some people that is just
> not feasible.

In that case, such people have no business scanning their email and should
outsource it or move to Google Apps.  Seriously.  We're talking about a
critical piece of security infrastructure and skimping on it will bite you.

> If you want to get the maximum out of the hardware
> that you have, then I don't see any reason not to use something like
> this.

If you want to maximize your scanning capacity, you'll put larger messages
aside and scan them later when the load goes down.

> In reality, few spams exceed the current default size limits.  So you
> are not losing much by staying with those limits anyway.

In which case the whole thing is probably moot and basing the size limit
on load is an unnecessary complication.

Regards,

David.

Re: Message not scanned- Size?

Posted by Bowie Bailey <Bo...@BUC.com>.
On 12/3/2012 4:12 PM, David F. Skoll wrote:
> On Mon, 03 Dec 2012 16:04:42 -0500
> Bowie Bailey <Bo...@BUC.com> wrote:
>
>> You are not relaxing the filtering, you are tightening it.
> It still IMO is a bad idea.  Effectively, you are lowering your
> security when your box is busier no matter how you look at it.
>
> If your box can't handle load spikes, then when it gets busy you
> should just spool mail and scan it later when the load goes back
> down.  If your box can't handle the *average* load, then it's time
> for a bigger box (or more boxes.)

Without this setup, you are always at the "lower security" level. Of 
course, everyone would like to have a box that can handle fully scanning 
every email that comes in, but for some people that is just not 
feasible.  If you want to get the maximum out of the hardware that you 
have, then I don't see any reason not to use something like this.

In reality, few spams exceed the current default size limits.  So you 
are not losing much by staying with those limits anyway.

-- 
Bowie

Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 03 Dec 2012 16:04:42 -0500
Bowie Bailey <Bo...@BUC.com> wrote:

> You are not relaxing the filtering, you are tightening it.

It still IMO is a bad idea.  Effectively, you are lowering your
security when your box is busier no matter how you look at it.

If your box can't handle load spikes, then when it gets busy you
should just spool mail and scan it later when the load goes back
down.  If your box can't handle the *average* load, then it's time
for a bigger box (or more boxes.)

Regards,

David.


Re: Message not scanned- Size?

Posted by Bowie Bailey <Bo...@BUC.com>.
On 12/3/2012 3:43 PM, David F. Skoll wrote:
> On Mon, 3 Dec 2012 22:03:25 +0200
> Henrik K <he...@hege.li> wrote:
>
>> On Mon, Dec 03, 2012 at 01:54:44PM -0500, Kevin A. McGrail wrote:
> [Test loadavg in filtering decisions]
>
>> Seems kind of pointless. Have you actually measured how larger
>> messages affect cpu usage?  Especially since usually there are much
>> less messages the larger they get.
> I agree.  LoadAVG is a pretty useless measurement.  And relaxing your
> filtering based on load gives spammers a clear signal how to defeat
> your filter.

I think you're looking at it the wrong way around.  You start with the 
max size that is currently in use.  You then increase the max size when 
the box isn't busy.

You are not relaxing the filtering, you are tightening it.

-- 
Bowie

Re: Message not scanned- Size?

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Mon, 3 Dec 2012 22:03:25 +0200
Henrik K <he...@hege.li> wrote:

> On Mon, Dec 03, 2012 at 01:54:44PM -0500, Kevin A. McGrail wrote:

[Test loadavg in filtering decisions]

> Seems kind of pointless. Have you actually measured how larger
> messages affect cpu usage?  Especially since usually there are much
> less messages the larger they get.

I agree.  LoadAVG is a pretty useless measurement.  And relaxing your
filtering based on load gives spammers a clear signal how to defeat
your filter.

> > Two, I'm trying a system that also truncates messages mid-message at
> > the threshold to scan them anyway.

> Why is it controversial? Amavisd-new 2.6.3 had this feature since
> 2009, so it's used probably very widely - even without users knowing
> it.  I've never seen any ill effects.

We truncate overly-long messages too, but we try to be intelligent
about it.  We shrink non-text MIME parts first and then if the message
is still too large, we give up.  Just blindly cutting a message in the
middle might wreck the MIME structure and give unexpected and unwanted
results.

Regards,

David.

Re: Message not scanned- Size?

Posted by Henrik K <he...@hege.li>.
On Mon, Dec 03, 2012 at 01:54:44PM -0500, Kevin A. McGrail wrote:
> 
> I've added two tricks to this filter.
> 
> One, I pass the load_avg to the filter and use it to modify the size
> limit for spamc based on load.  The lower the load, the higher the
> multiplier.

Seems kind of pointless. Have you actually measured how larger messages
affect cpu usage?  Especially since usually there are much less messages the
larger they get.

If 10% more cpu or whatever makes your server hang, then you have other
problems.  :-)

> Two, I'm trying a system that also truncates messages mid-message at
> the threshold to scan them anyway.
> 
> The second idea has been pretty controversial but I think the first
> one is a neat idea.

Why is it controversial? Amavisd-new 2.6.3 had this feature since 2009, so
it's used probably very widely - even without users knowing it.  I've never
seen any ill effects.

> This might be something neat to add to spamd/spamc natively where
> there is a load average multiplier concept. Thoughts?

Load average is quite silly measurement. The maximum depends on the number
of cpu cores/threads.  Also this kind of limiting makes sense only on a
pre-queue filter in which case such "throttles" should be applied way before
anything gets to SA.


Re: Message not scanned- Size?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 12/3/2012 8:13 AM, Alexandre Boyer wrote:
> Hi,
>
> I guess you may change your threshold for the cut off? the -s flag, when
> calling spamc seems to be it.
>
> I use amavisd-new to feed SA, it does the same thing, I had to change my
> threshold too to analyze bigger emails.

I'm currently testing some tricks.

I use MimeDefang to call spamc through a custom filter to interacts with 
spamd.

I've added two tricks to this filter.

One, I pass the load_avg to the filter and use it to modify the size 
limit for spamc based on load.  The lower the load, the higher the 
multiplier.

Two, I'm trying a system that also truncates messages mid-message at the 
threshold to scan them anyway.


The second idea has been pretty controversial but I think the first one 
is a neat idea.

This might be something neat to add to spamd/spamc natively where there 
is a load average multiplier concept. Thoughts?

Regards,
KAM

Re: Message not scanned- Size?

Posted by Alexandre Boyer <bi...@gmail.com>.
Hi,

I guess you may change your threshold for the cut off? the -s flag, when
calling spamc seems to be it.

I use amavisd-new to feed SA, it does the same thing, I had to change my
threshold too to analyze bigger emails.

Best,

Alex, from prypiat.
Yes, I recycle.


On 12-12-03 06:25 AM, Joseph Acquisto wrote:
> A message slipped through untouced.  Obvious spam from "Minister of Finance" with many attachments.
>
> /var/log/mail shows a message skipped "spamc[7262]: skipped message, greater than max message
> size (512000 bytes)" at the time this came thru.
>
> I'd guess that was it.  Unusual, but any way to prevent that in future?
>
> joe a.
>
>