You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by "David F. Skoll" <df...@roaringpenguin.com> on 2015/03/14 17:55:26 UTC

Handling very large messages (was Re: Which milter do you prefer?)

On Sat, 14 Mar 2015 17:08:50 +0100
Reindl Harald <h....@thelounge.net> wrote:

> Am 14.03.2015 um 17:00 schrieb Kevin A. McGrail:
> > On 3/14/2015 1:14 AM, David B Funk wrote:
> >> truncating a large message and
> >> only passing the first N-KB to SA. As that involves munging MIME
> >> headers it has to be done inside the milter.
> >>
> > I just truncate the message hard and it generally works better than
> > not scanning.  What do you do to truncate?

> how do you truncate messages for the scan?

I can't answer for Kevin, but what we do is this: For oversize
messages, we remove non text/* attachments.  If they're still
oversize, we truncate the text/plain parts.  If they're still
oversize, we truncate the text/html parts.  We do this very carefully
with MIME::tools to ensure that SpamAssassin always sees a valid MIME
message and not (for example) one with a missing boundary.

We use MIMEDefang for SpamAssassin integration, so we can play whatever
tricks we like with the data that gets passed to SpamAssassin without
actually messing with the original message.

Regards,

David.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Mon, 16 Mar 2015 10:51:59 -0400
"Bill Cole" <sa...@billmail.scconsult.com> wrote:

> Is the code for doing this shared anywhere or is it sharable? Please?

It's part of our commercial CanIt software.  But I can post a
chunk of Perl that's roughly what we do.

We parse the message into a MIME::Entity.  Then if we need to truncate
it, we call a function similar to the code shown below.  It generates a
version of the message with all non-text parts emptied out.  It's a really
evil hack.

Code is for your education.  Most likely needs considerable tweaking
for production; our real production code obviously is much more
sophisticated.

Regards,

David.

# Pass the function below a MIME entity.  It returns a file handle
# opened for reading on a truncated message body.

sub open_truncated_body
{
	my($mime_entity) = @_;

	# We are truncating non-text parts.  So
	# we override print_body... ugh.
	no warnings 'redefine'; ## no critic (NoWarnings)
	my $original_print_bodyhandle = *MIME::Entity::print_bodyhandle{'CODE'};
	local *MIME::Entity::print_bodyhandle = sub {
		my($self, $out) = @_;
		$out ||= select;
		if ($self->head->mime_type =~ m|^text/|) {
			# Evil...
			# TODO FIXME: per ticket 15530, need to cap
			# the size of text/* attachments.
			# Unfortunately, the only way to do this may
			# require even more evil.
			return &$original_print_bodyhandle($self, $out);
		}

		# Leave empty!!!
		return 1;
	};
	my $fh = IO::File->new('TRUNCATED-MSG', O_WRONLY|O_CREAT);
	if ($fh && $fh->opened) {
		$mime_entity->print($fh);
		$fh->close();
	} else {
		return undef;
	}
	return IO::File->new('TRUNCATED-MSG', O_RDONLY);
}

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 14 Mar 2015, at 12:55, David F. Skoll wrote:
[...]
> I can't answer for Kevin, but what we do is this: For oversize
> messages, we remove non text/* attachments.  If they're still
> oversize, we truncate the text/plain parts.  If they're still
> oversize, we truncate the text/html parts.  We do this very carefully
> with MIME::tools to ensure that SpamAssassin always sees a valid MIME
> message and not (for example) one with a missing boundary.
>
> We use MIMEDefang for SpamAssassin integration, so we can play 
> whatever
> tricks we like with the data that gets passed to SpamAssassin without
> actually messing with the original message.

Is the code for doing this shared anywhere or is it sharable? Please?

(1. I'm lazy 2. I'm not egotistical enough to think my own MD code 
wouldn't be objective worse than yours)

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 3/14/2015 1:01 PM, Robert Schetterer wrote:
> define oversize..., cutting mail content may not allowed in many
> countries, most legal policy, is reject ( at income smtp level ) or pass
> , tag passed mail is allowed if the reciept accepts this
We are talking about modifying a stateful copy of the email which is 
used only for email classification testing.

regards,
KAM

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 16.03.2015 um 19:30 schrieb Reindl Harald:
> 
> 
> Am 16.03.2015 um 19:24 schrieb Robert Schetterer:
>> Am 16.03.2015 um 18:33 schrieb Reindl Harald:
>>> Am 16.03.2015 um 18:19 schrieb Matus UHLAR - fantomas:
>>>> On 16.03.15 00:59, Jude DaShiell wrote:
>>>>> I have been getting large spam messages for several years on one of my
>>>>> accounts.  Since spamassassin cannot handle them, my only recourse are
>>>>> procmail recipes.
>>>>
>>>> spamassassin CAN handle them. I have ocnfigued spamass-milter to
>>>> process
>>>> all
>>>> mail (by setting size to the same as maximum alllowed mail size) and it
>>>> does...
>>>>
>>>> it't just the spamc default that is 512K. spamd maximum is 512M
>>>> afaik, I
>>>> don't think  you receive such big mail...
>>>
>>> depends on the amount and content of mails since it skips binary
>>> attachment contents
>>>
>>> try really large plaintext content and it takes more than 10 seconds per
>>> message with 100% CPU load - you will notice that quickly ba attach a
>>> large plaintext logfile in case of spamass-milter on a submission server
>>> because it ends in a client timeout
>>>
>>
>> dont use spamass-milter with submission, its to slow
> 
> only for large plaintext content which is the topic of that thread

as i tested it, and judged it unacceptable slow in real world setups
but this maybe different elsewhere

> 
>> clamav-milter with sanesecurity fits better ( faster )
> 
> but it don't find anything countable
> 
> here are a lot of sanesecurity signatures active (inbound MX) and
> because the low hit-rate i ordered it finally after SA which catchs much
> more and so one content-scanner can be skipped in many cases
> 
>> after all outbound spam scanning is difficult ever
> 
> but sadly needed in case of hacked accounts, in the past more than once
> even masked a successful dictionary attack because the bot did not
> realize the successful SASL login and continued try other passwords
> after the milter-reject
> 

mailadmins are not promised to have an easy life  *g

a better use would be some "abnormality" detection system for catching
hacked accounts, i.e with profiling normal user behave  and compare..

Some simple reject match i.e might be many logins from wide different
geo ip locations in short time periods etc

this might help too in some setups

https://www.roessner-network-solutions.com/postfix-milter-vrfydmn/
https://github.com/croessner/vrfydmn

...
2nd scenarion

You provde mail services for customers that deliver their mail over
submission. If you have infected PCs where bots are going to send mails
over users account, they can fake the sender addresses.
...



Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 16.03.2015 um 19:24 schrieb Robert Schetterer:
> Am 16.03.2015 um 18:33 schrieb Reindl Harald:
>> Am 16.03.2015 um 18:19 schrieb Matus UHLAR - fantomas:
>>> On 16.03.15 00:59, Jude DaShiell wrote:
>>>> I have been getting large spam messages for several years on one of my
>>>> accounts.  Since spamassassin cannot handle them, my only recourse are
>>>> procmail recipes.
>>>
>>> spamassassin CAN handle them. I have ocnfigued spamass-milter to process
>>> all
>>> mail (by setting size to the same as maximum alllowed mail size) and it
>>> does...
>>>
>>> it't just the spamc default that is 512K. spamd maximum is 512M afaik, I
>>> don't think  you receive such big mail...
>>
>> depends on the amount and content of mails since it skips binary
>> attachment contents
>>
>> try really large plaintext content and it takes more than 10 seconds per
>> message with 100% CPU load - you will notice that quickly ba attach a
>> large plaintext logfile in case of spamass-milter on a submission server
>> because it ends in a client timeout
>>
>
> dont use spamass-milter with submission, its to slow

only for large plaintext content which is the topic of that thread

> clamav-milter with sanesecurity fits better ( faster )

but it don't find anything countable

here are a lot of sanesecurity signatures active (inbound MX) and 
because the low hit-rate i ordered it finally after SA which catchs much 
more and so one content-scanner can be skipped in many cases

> after all outbound spam scanning is difficult ever

but sadly needed in case of hacked accounts, in the past more than once 
even masked a successful dictionary attack because the bot did not 
realize the successful SASL login and continued try other passwords 
after the milter-reject

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 16.03.2015 um 18:33 schrieb Reindl Harald:
> 
> 
> Am 16.03.2015 um 18:19 schrieb Matus UHLAR - fantomas:
>> On 16.03.15 00:59, Jude DaShiell wrote:
>>> I have been getting large spam messages for several years on one of my
>>> accounts.  Since spamassassin cannot handle them, my only recourse are
>>> procmail recipes.
>>
>> spamassassin CAN handle them. I have ocnfigued spamass-milter to process
>> all
>> mail (by setting size to the same as maximum alllowed mail size) and it
>> does...
>>
>> it't just the spamc default that is 512K. spamd maximum is 512M afaik, I
>> don't think  you receive such big mail...
> 
> depends on the amount and content of mails since it skips binary
> attachment contents
> 
> try really large plaintext content and it takes more than 10 seconds per
> message with 100% CPU load - you will notice that quickly ba attach a
> large plaintext logfile in case of spamass-milter on a submission server
> because it ends in a client timeout
> 

dont use spamass-milter with submission, its to slow, clamav-milter with
sanesecurity fits better ( faster ), after all outbound spam scanning is
difficult ever


Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 16.03.2015 um 18:19 schrieb Matus UHLAR - fantomas:
> On 16.03.15 00:59, Jude DaShiell wrote:
>> I have been getting large spam messages for several years on one of my
>> accounts.  Since spamassassin cannot handle them, my only recourse are
>> procmail recipes.
>
> spamassassin CAN handle them. I have ocnfigued spamass-milter to process
> all
> mail (by setting size to the same as maximum alllowed mail size) and it
> does...
>
> it't just the spamc default that is 512K. spamd maximum is 512M afaik, I
> don't think  you receive such big mail...

depends on the amount and content of mails since it skips binary 
attachment contents

try really large plaintext content and it takes more than 10 seconds per 
message with 100% CPU load - you will notice that quickly ba attach a 
large plaintext logfile in case of spamass-milter on a submission server 
because it ends in a client timeout

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 16.03.15 00:59, Jude DaShiell wrote:
>I have been getting large spam messages for several years on one of 
>my accounts.  Since spamassassin cannot handle them, my only recourse 
>are procmail recipes.

spamassassin CAN handle them. I have ocnfigued spamass-milter to process all
mail (by setting size to the same as maximum alllowed mail size) and it
does...

it't just the spamc default that is 512K. spamd maximum is 512M afaik, I
don't think  you receive such big mail...

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>Am 15.03.2015 um 19:03 schrieb Axb:
>>IMO, deciding what chunk of a msg should be scanned should be managed by
>>the glue and not by SA.

On 15.03.15 19:09, Reindl Harald wrote:
>true but if the glue (spamass-milter) would truncate the message it 
>passes to spamc it would get back that truncated message with the 
>added headers (which are used to decide reject or pass) and so 
>finally *deliver* the truncated version

You can turn this off with option "-m" for spamass-milter, so only
"X-Spam-*" headers will be added.

With this you can pass option "--headers" to spamc, which only returns
modified headers by spamd.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Jude DaShiell <jd...@panix.com>.

I have been getting large spam messages for several years on one of my 
accounts.  Since spamassassin cannot handle them, my only recourse are 
procmail recipes.


-- Twitter: JudeDaShiell


On Sun, 15 Mar 2015, Robert Schetterer wrote:

> Am 15.03.2015 um 12:05 schrieb Reindl Harald:
>>
>> Am 14.03.2015 um 20:17 schrieb Robert Schetterer:
>>> Am 14.03.2015 um 18:11 schrieb Reindl Harald:
>>>> nobody but talks about cut content
>>>>
>>>> we talk about how to pass only a part to spamassassin instead skip large
>>>> messages entirely which in many case would be enough to detect a message
>>>> as spam because the "oversize" are just binary parts
>>>
>>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
>>
>> you are so terrible wrong
>
> my intention was never to agree with you
>
>>
>> more and more spam messages are coming with a very large image because
>> spammers know the default 256 KB limit which also affects commercial
>> products like from Barracuda Networks, that is not a new trend
>>
>> there is a reason for "-s 5242880" in our setup while i started with "-s
>> 786432" a few months ago
>>
>
> as i wrote this may happen at your site, you should not set your
> experience as ultimate
> everyone has his/its own spam, i dont see any rise in large mail spam here
>
> back to topic i would recommend a two stage spam filtering, if you got
> in trouble with "big spam mail", i.e spamass-milter in front line, then
> "perhaps" combine sieve filters with size/spam matches etc
>
> Best Regards
> MfG Robert Schetterer
>
>

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Sun, 15 Mar 2015 14:19:17 -0500 (CDT)
Dave Funk <db...@engineering.uiowa.edu> wrote:

> However that glue can be intelligent and contain business logic.

And getting back to the original topic... that is why my favorite
milter is MIMEDefang. :)

It does integrate with SpamAssassin, but it also lets you write your own
business logic in Perl which can make for a very powerful combination.

Regards,

David.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Dave Funk <db...@engineering.uiowa.edu>.

On Sun, 15 Mar 2015, Reindl Harald wrote:

>
>
> Am 15.03.2015 um 19:15 schrieb Axb:
>> On 03/15/2015 07:09 PM, Reindl Harald wrote:
>>> 
[snip..]
>>>> IMO, deciding what chunk of a msg should be scanned should be managed by
>>>> the glue and not by SA.
>>> 
>>> true but if the glue (spamass-milter) would truncate the message it
>>> passes to spamc it would get back that truncated message with the added
>>> headers (which are used to decide reject or pass) and so finally
>>> *deliver* the truncated version
>> 
>> then spamass-milter is the wrong choice
>
> how else should it work?
>
> it hardly can invent the report-headers SA adds by itself which needs to land 
> in the final message, spamc/spamd are doing the message work and the milter 
> is just the glue to bring the MTA and SA together

However that glue can be intelligent and contain business logic.

If the author of the milter knows what they are doing (and cares) this is
very straightforward thing to do (I know because I did it with 
milterassassin).
In the milter you must take an explicit extra step if you want to mess
with the body of the message (smfi_replacebody). It's actually easier to
just add/replace headers (smfi_addheader/smfi_chgheader) then it is
to mess with the body. (not to mention faster & more efficient).

So logic is; milter receives -copy- of message from sendmail, milter
passes 'REPORT' command & (optionally truncated) message to spamd, gets 
back a headers-only report. milter then tells sendmail to add the 
new/modified headers and doesn't mess with the body.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/15/2015 09:27 PM, Reindl Harald wrote:
>
> Am 15.03.2015 um 21:12 schrieb Axb:
>> On 03/15/2015 09:00 PM, Reindl Harald wrote:
>>>>> that could be even a sloppy implementation just truncate after XX
>>>>> bytes
>>>>> and analyze the remaining piece to keep that part simple and fast - at
>>>>> the end it would improve the result with as less as possible overhead
>>>>> and code compared to skip a message
>>>>
>>>> that wheel has been invented... and quite a few do it right. Your
>>>> choice
>>>> of glue is not one of them. And SA shouldn't follow the bad examples
>>>
>>> no - problems in general should be solved at the root cause
>>>
>>> the root cause is that SA is overwhelmed with large mails and so instead
>>> work around that problem in every glue on that planet it just should
>>> only scan the amount of a message it can handle
>>>
>>> you may disagree because bounce that burden to the glue needs no effort
>>> on your side, but that don't make it right
>>
>> if you think SA is "overwhelmed" then it's definitely the wrong tool for
>> you.
>>
>> maybe you should try rspamd (https://rspamd.com/)
>
> come on, stop that attitude
> what is the reason that you feel always attacked and suggest people
> should creep away instead see constructive criticism as positive and
> helpful over the long?

I'm trying to suggest options which may help you instead of expecting SA 
to bend backwards just coz you think it should.

SA is the most inneficient link in the chain, so it wouldn't be very 
smart to do what you suggest when it should be tackled before SA sees 
any content.

> otherwise the default for spamc of 500k and sa-learn of 256k won't exist
> which are both raised here to much larger values after find out the
> amount of skipped messages for scanning and ignored messages for training

SA is a framework - per default, with conservative settings, it works 
very well for most ppl, those who prefer other values can and will 
change them.
Many ppl even fork it and bend it to fit their needs.

RIP horse.. I'm outta here

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 15.03.2015 um 21:12 schrieb Axb:
> On 03/15/2015 09:00 PM, Reindl Harald wrote:
>>>> that could be even a sloppy implementation just truncate after XX bytes
>>>> and analyze the remaining piece to keep that part simple and fast - at
>>>> the end it would improve the result with as less as possible overhead
>>>> and code compared to skip a message
>>>
>>> that wheel has been invented... and quite a few do it right. Your choice
>>> of glue is not one of them. And SA shouldn't follow the bad examples
>>
>> no - problems in general should be solved at the root cause
>>
>> the root cause is that SA is overwhelmed with large mails and so instead
>> work around that problem in every glue on that planet it just should
>> only scan the amount of a message it can handle
>>
>> you may disagree because bounce that burden to the glue needs no effort
>> on your side, but that don't make it right
>
> if you think SA is "overwhelmed" then it's definitely the wrong tool for
> you.
>
> maybe you should try rspamd (https://rspamd.com/)

come on, stop that attitude

what is the reason that you feel always attacked and suggest people 
should creep away instead see constructive criticism as positive and 
helpful over the long?

not i think it is overwhelmed, the SA developers do

otherwise the default for spamc of 500k and sa-learn of 256k won't exist 
which are both raised here to much larger values after find out the 
amount of skipped messages for scanning and ignored messages for training

-s, --max-size size Specify maximum message size, in bytes.
                       [default: 500k]

  --max-size <b>        Skip messages larger than b bytes;
                            defaults to 256 KiB, 0 implies no limit

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/15/2015 09:00 PM, Reindl Harald wrote:
>
>
> Am 15.03.2015 um 20:35 schrieb Axb:
>> On 03/15/2015 08:22 PM, Reindl Harald wrote:
>>>
>>> Am 15.03.2015 um 19:50 schrieb Martin Gregorie:
>>>> On Sun, 2015-03-15 at 19:23 +0100, Reindl Harald wrote:
>>>>>
>>>>> Am 15.03.2015 um 19:15 schrieb Axb:
>>>>>>> true but if the glue (spamass-milter) would truncate the message it
>>>>>>> passes to spamc it would get back that truncated message with the
>>>>>>> added
>>>>>>> headers (which are used to decide reject or pass) and so finally
>>>>>>> *deliver* the truncated version
>>>>>>
>>>>>> then spamass-milter is the wrong choice
>>>>>
>>>>> how else should it work?
>>>>>
>>>>> it hardly can invent the report-headers SA adds by itself which
>>>>> needs to
>>>>> land in the final message, spamc/spamd are doing the message work and
>>>>> the milter is just the glue to bring the MTA and SA together
>>>>>
>>>> No but, as others have suggested, if the glue shortens the message by
>>>> using MIME-aware code to remove binary attachments, it should be
>>>> easy to
>>>> keep them while spamd scans the shortened message and then put them
>>>> back
>>>> before the message is sent on downstream
>>>
>>> that's error prone and assumes that all mails are 100% valid
>>>
>>> adding headers is a dead safe process
>>> mangle other parts of a mail is not
>>>
>>> hence the only safe and right thing to do is have SA internally work
>>> with a truncated version for analyze transparent to the glue and other
>>> components or just continue with skip messages above a defined size from
>>> scanning at all
>>>
>>> that could be even a sloppy implementation just truncate after XX bytes
>>> and analyze the remaining piece to keep that part simple and fast - at
>>> the end it would improve the result with as less as possible overhead
>>> and code compared to skip a message
>>
>> that wheel has been invented... and quite a few do it right. Your choice
>> of glue is not one of them. And SA shouldn't follow the bad examples
>
> no - problems in general should be solved at the root cause
>
> the root cause is that SA is overwhelmed with large mails and so instead
> work around that problem in every glue on that planet it just should
> only scan the amount of a message it can handle
>
> you may disagree because bounce that burden to the glue needs no effort
> on your side, but that don't make it right

if you think SA is "overwhelmed" then it's definitely the wrong tool for 
you.

maybe you should try rspamd (https://rspamd.com/)

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 15.03.2015 um 20:35 schrieb Axb:
> On 03/15/2015 08:22 PM, Reindl Harald wrote:
>>
>> Am 15.03.2015 um 19:50 schrieb Martin Gregorie:
>>> On Sun, 2015-03-15 at 19:23 +0100, Reindl Harald wrote:
>>>>
>>>> Am 15.03.2015 um 19:15 schrieb Axb:
>>>>>> true but if the glue (spamass-milter) would truncate the message it
>>>>>> passes to spamc it would get back that truncated message with the
>>>>>> added
>>>>>> headers (which are used to decide reject or pass) and so finally
>>>>>> *deliver* the truncated version
>>>>>
>>>>> then spamass-milter is the wrong choice
>>>>
>>>> how else should it work?
>>>>
>>>> it hardly can invent the report-headers SA adds by itself which
>>>> needs to
>>>> land in the final message, spamc/spamd are doing the message work and
>>>> the milter is just the glue to bring the MTA and SA together
>>>>
>>> No but, as others have suggested, if the glue shortens the message by
>>> using MIME-aware code to remove binary attachments, it should be easy to
>>> keep them while spamd scans the shortened message and then put them back
>>> before the message is sent on downstream
>>
>> that's error prone and assumes that all mails are 100% valid
>>
>> adding headers is a dead safe process
>> mangle other parts of a mail is not
>>
>> hence the only safe and right thing to do is have SA internally work
>> with a truncated version for analyze transparent to the glue and other
>> components or just continue with skip messages above a defined size from
>> scanning at all
>>
>> that could be even a sloppy implementation just truncate after XX bytes
>> and analyze the remaining piece to keep that part simple and fast - at
>> the end it would improve the result with as less as possible overhead
>> and code compared to skip a message
>
> that wheel has been invented... and quite a few do it right. Your choice
> of glue is not one of them. And SA shouldn't follow the bad examples

no - problems in general should be solved at the root cause

the root cause is that SA is overwhelmed with large mails and so instead 
work around that problem in every glue on that planet it just should 
only scan the amount of a message it can handle

you may disagree because bounce that burden to the glue needs no effort 
on your side, but that don't make it right

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/15/2015 08:22 PM, Reindl Harald wrote:
>
> Am 15.03.2015 um 19:50 schrieb Martin Gregorie:
>> On Sun, 2015-03-15 at 19:23 +0100, Reindl Harald wrote:
>>>
>>> Am 15.03.2015 um 19:15 schrieb Axb:
>>>>> true but if the glue (spamass-milter) would truncate the message it
>>>>> passes to spamc it would get back that truncated message with the
>>>>> added
>>>>> headers (which are used to decide reject or pass) and so finally
>>>>> *deliver* the truncated version
>>>>
>>>> then spamass-milter is the wrong choice
>>>
>>> how else should it work?
>>>
>>> it hardly can invent the report-headers SA adds by itself which needs to
>>> land in the final message, spamc/spamd are doing the message work and
>>> the milter is just the glue to bring the MTA and SA together
>>>
>> No but, as others have suggested, if the glue shortens the message by
>> using MIME-aware code to remove binary attachments, it should be easy to
>> keep them while spamd scans the shortened message and then put them back
>> before the message is sent on downstream
>
> that's error prone and assumes that all mails are 100% valid
>
> adding headers is a dead safe process
> mangle other parts of a mail is not
>
> hence the only safe and right thing to do is have SA internally work
> with a truncated version for analyze transparent to the glue and other
> components or just continue with skip messages above a defined size from
> scanning at all
>
> that could be even a sloppy implementation just truncate after XX bytes
> and analyze the remaining piece to keep that part simple and fast - at
> the end it would improve the result with as less as possible overhead
> and code compared to skip a message

that wheel has been invented... and quite a few do it right. Your choice 
of glue is not one of them. And SA shouldn't follow the bad examples.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 15.03.2015 um 19:50 schrieb Martin Gregorie:
> On Sun, 2015-03-15 at 19:23 +0100, Reindl Harald wrote:
>>
>> Am 15.03.2015 um 19:15 schrieb Axb:
>>>> true but if the glue (spamass-milter) would truncate the message it
>>>> passes to spamc it would get back that truncated message with the added
>>>> headers (which are used to decide reject or pass) and so finally
>>>> *deliver* the truncated version
>>>
>>> then spamass-milter is the wrong choice
>>
>> how else should it work?
>>
>> it hardly can invent the report-headers SA adds by itself which needs to
>> land in the final message, spamc/spamd are doing the message work and
>> the milter is just the glue to bring the MTA and SA together
>>
> No but, as others have suggested, if the glue shortens the message by
> using MIME-aware code to remove binary attachments, it should be easy to
> keep them while spamd scans the shortened message and then put them back
> before the message is sent on downstream

that's error prone and assumes that all mails are 100% valid

adding headers is a dead safe process
mangle other parts of a mail is not

hence the only safe and right thing to do is have SA internally work 
with a truncated version for analyze transparent to the glue and other 
components or just continue with skip messages above a defined size from 
scanning at all

that could be even a sloppy implementation just truncate after XX bytes 
and analyze the remaining piece to keep that part simple and fast - at 
the end it would improve the result with as less as possible overhead 
and code compared to skip a message

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Martin Gregorie <ma...@gregorie.org>.

On Sun, 2015-03-15 at 19:23 +0100, Reindl Harald wrote:
> 
> Am 15.03.2015 um 19:15 schrieb Axb:
> > On 03/15/2015 07:09 PM, Reindl Harald wrote:
> >>
> >> Am 15.03.2015 um 19:03 schrieb Axb:
> >>> On 03/15/2015 06:49 PM, Robert Schetterer wrote:
> >>>> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
> >>>>> tagging is allowed, rejecting is nice but not a must have
> >>>>
> >>>> if you like reject try working in milter chaining with
> >>>> milter-manager http://milter-manager.sourceforge.net/ ( stats
> >>>> included )
> >>>> this gives you option for complex filter scenarios with div milters
> >>>> for size perhaps test combine with milter
> >>>> milter-size
> >>>> http://www.safe-mbox.com/~rgooch/email/index.html
> >>>> and many other milter stuff i.e
> >>>> https://www.milter.org/milter/98
> >>>> MSH Attach Filter
> >>>>
> >>>> not easy to do but should be extrem powerfull and flexible
> >>>> so on topic you dont need to choose a "prefered milter", just chain and
> >>>> combine all milters you like
> >>>
> >>> which makes much more sense thatn bending SA to do stuff it's not
> >>> designed to.
> >>>
> >>> IMO, deciding what chunk of a msg should be scanned should be managed by
> >>> the glue and not by SA.
> >>
> >> true but if the glue (spamass-milter) would truncate the message it
> >> passes to spamc it would get back that truncated message with the added
> >> headers (which are used to decide reject or pass) and so finally
> >> *deliver* the truncated version
> >
> > then spamass-milter is the wrong choice
> 
> how else should it work?
> 
> it hardly can invent the report-headers SA adds by itself which needs to 
> land in the final message, spamc/spamd are doing the message work and 
> the milter is just the glue to bring the MTA and SA together
> 
No but, as others have suggested, if the glue shortens the message by
using MIME-aware code to remove binary attachments, it should be easy to
keep them while spamd scans the shortened message and then put them back
before the message is sent on downstream.

 
Martin

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 15.03.2015 um 19:15 schrieb Axb:
> On 03/15/2015 07:09 PM, Reindl Harald wrote:
>>
>> Am 15.03.2015 um 19:03 schrieb Axb:
>>> On 03/15/2015 06:49 PM, Robert Schetterer wrote:
>>>> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
>>>>> tagging is allowed, rejecting is nice but not a must have
>>>>
>>>> if you like reject try working in milter chaining with
>>>> milter-manager http://milter-manager.sourceforge.net/ ( stats
>>>> included )
>>>> this gives you option for complex filter scenarios with div milters
>>>> for size perhaps test combine with milter
>>>> milter-size
>>>> http://www.safe-mbox.com/~rgooch/email/index.html
>>>> and many other milter stuff i.e
>>>> https://www.milter.org/milter/98
>>>> MSH Attach Filter
>>>>
>>>> not easy to do but should be extrem powerfull and flexible
>>>> so on topic you dont need to choose a "prefered milter", just chain and
>>>> combine all milters you like
>>>
>>> which makes much more sense thatn bending SA to do stuff it's not
>>> designed to.
>>>
>>> IMO, deciding what chunk of a msg should be scanned should be managed by
>>> the glue and not by SA.
>>
>> true but if the glue (spamass-milter) would truncate the message it
>> passes to spamc it would get back that truncated message with the added
>> headers (which are used to decide reject or pass) and so finally
>> *deliver* the truncated version
>
> then spamass-milter is the wrong choice

how else should it work?

it hardly can invent the report-headers SA adds by itself which needs to 
land in the final message, spamc/spamd are doing the message work and 
the milter is just the glue to bring the MTA and SA together

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/15/2015 07:09 PM, Reindl Harald wrote:
>
> Am 15.03.2015 um 19:03 schrieb Axb:
>> On 03/15/2015 06:49 PM, Robert Schetterer wrote:
>>> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
>>>> tagging is allowed, rejecting is nice but not a must have
>>>
>>> if you like reject try working in milter chaining with
>>> milter-manager http://milter-manager.sourceforge.net/ ( stats included )
>>> this gives you option for complex filter scenarios with div milters
>>> for size perhaps test combine with milter
>>> milter-size
>>> http://www.safe-mbox.com/~rgooch/email/index.html
>>> and many other milter stuff i.e
>>> https://www.milter.org/milter/98
>>> MSH Attach Filter
>>>
>>> not easy to do but should be extrem powerfull and flexible
>>> so on topic you dont need to choose a "prefered milter", just chain and
>>> combine all milters you like
>>
>> which makes much more sense thatn bending SA to do stuff it's not
>> designed to.
>>
>> IMO, deciding what chunk of a msg should be scanned should be managed by
>> the glue and not by SA.
>
> true but if the glue (spamass-milter) would truncate the message it
> passes to spamc it would get back that truncated message with the added
> headers (which are used to decide reject or pass) and so finally
> *deliver* the truncated version

then spamass-milter is the wrong choice.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 15.03.2015 um 19:03 schrieb Axb:
> On 03/15/2015 06:49 PM, Robert Schetterer wrote:
>> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
>>> tagging is allowed, rejecting is nice but not a must have
>>
>> if you like reject try working in milter chaining with
>> milter-manager http://milter-manager.sourceforge.net/ ( stats included )
>> this gives you option for complex filter scenarios with div milters
>> for size perhaps test combine with milter
>> milter-size
>> http://www.safe-mbox.com/~rgooch/email/index.html
>> and many other milter stuff i.e
>> https://www.milter.org/milter/98
>> MSH Attach Filter
>>
>> not easy to do but should be extrem powerfull and flexible
>> so on topic you dont need to choose a "prefered milter", just chain and
>> combine all milters you like
>
> which makes much more sense thatn bending SA to do stuff it's not
> designed to.
>
> IMO, deciding what chunk of a msg should be scanned should be managed by
> the glue and not by SA.

true but if the glue (spamass-milter) would truncate the message it 
passes to spamc it would get back that truncated message with the added 
headers (which are used to decide reject or pass) and so finally 
*deliver* the truncated version

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/16/2015 10:16 AM, Reindl Harald wrote:
>
> Am 16.03.2015 um 03:43 schrieb Dave Warren:
>> On 2015-03-15 17:26, Reindl Harald wrote:
>>>
>>> Am 16.03.2015 um 01:23 schrieb Dave Warren:
>>>> On 2015-03-15 15:01, Reindl Harald wrote:
>>>>> surely, only 5% of incoming spam attempts make it to spamassassin /
>>>>> clamav here, but you need to keep in mind the amount of your regular
>>>>> ham messages in your mailflow which unconditionally touch the content
>>>>> scanners
>>>>
>>>> Why would it? I'd hazard a guess that, on a percentage basis, I run
>>>> less
>>>> ham though SpamAssassin than spam
>>>
>>> than your MTA filters *before* SA just don't work or you have very few
>>> legit mail at all
>>
>> Not at all, I just have comprehensive, adaptive, user-learned
>> whitelisting that catch the vast majority of legitimate mail before it
>> hits SpamAssassin. By whitelisting known-good sources aggressively and
>> automatically, I can cut the false positive rate to near zero, allowing
>> me to filter more aggressively at later stages.
>>
>>> 95% of any delivery attempt is blocked by a sensible
>>> DNSBL/DNSWL/PTR/HELO check on the MTA level and never makes it to
>>> milters at all
>>
>> SpamAssassin need only be responsible for sorting through mail that
>> isn't already known to be good or bad, putting known-good mail through
>> SpamAssassin is wasteful
>
> we are talking about milters and you can't bypass milters with postfix
> other than reject before but nor for whitelisting - period

yes you can *IF* they support using access tables
eg: milter-link

The action words supported by milter-link are:

OK	White list, by-pass one or more tests.
REJECT	Black list, reject connection, sender, recipient, etc.
SKIP	Stop lookup and return no result ie. continue testing.
DUNNO	Same as SKIP, commonly used by postfix.

as with all other Snertsoft's milters.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 16.03.2015 um 03:43 schrieb Dave Warren:
> On 2015-03-15 17:26, Reindl Harald wrote:
>>
>> Am 16.03.2015 um 01:23 schrieb Dave Warren:
>>> On 2015-03-15 15:01, Reindl Harald wrote:
>>>> surely, only 5% of incoming spam attempts make it to spamassassin /
>>>> clamav here, but you need to keep in mind the amount of your regular
>>>> ham messages in your mailflow which unconditionally touch the content
>>>> scanners
>>>
>>> Why would it? I'd hazard a guess that, on a percentage basis, I run less
>>> ham though SpamAssassin than spam
>>
>> than your MTA filters *before* SA just don't work or you have very few
>> legit mail at all
>
> Not at all, I just have comprehensive, adaptive, user-learned
> whitelisting that catch the vast majority of legitimate mail before it
> hits SpamAssassin. By whitelisting known-good sources aggressively and
> automatically, I can cut the false positive rate to near zero, allowing
> me to filter more aggressively at later stages.
>
>> 95% of any delivery attempt is blocked by a sensible
>> DNSBL/DNSWL/PTR/HELO check on the MTA level and never makes it to
>> milters at all
>
> SpamAssassin need only be responsible for sorting through mail that
> isn't already known to be good or bad, putting known-good mail through
> SpamAssassin is wasteful

we are talking about milters and you can't bypass milters with postfix 
other than reject before but nor for whitelisting - period

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Dave Warren <da...@hireahit.com>.

On 2015-03-15 17:26, Reindl Harald wrote:
>
> Am 16.03.2015 um 01:23 schrieb Dave Warren:
>> On 2015-03-15 15:01, Reindl Harald wrote:
>>> surely, only 5% of incoming spam attempts make it to spamassassin /
>>> clamav here, but you need to keep in mind the amount of your regular
>>> ham messages in your mailflow which unconditionally touch the content
>>> scanners
>>
>> Why would it? I'd hazard a guess that, on a percentage basis, I run less
>> ham though SpamAssassin than spam
>
> than your MTA filters *before* SA just don't work or you have very few 
> legit mail at all
>

Not at all, I just have comprehensive, adaptive, user-learned 
whitelisting that catch the vast majority of legitimate mail before it 
hits SpamAssassin. By whitelisting known-good sources aggressively and 
automatically, I can cut the false positive rate to near zero, allowing 
me to filter more aggressively at later stages.

> 95% of any delivery attempt is blocked by a sensible 
> DNSBL/DNSWL/PTR/HELO check on the MTA level and never makes it to 
> milters at all 

SpamAssassin need only be responsible for sorting through mail that 
isn't already known to be good or bad, putting known-good mail through 
SpamAssassin is wasteful.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 16.03.2015 um 01:23 schrieb Dave Warren:
> On 2015-03-15 15:01, Reindl Harald wrote:
>> surely, only 5% of incoming spam attempts make it to spamassassin /
>> clamav here, but you need to keep in mind the amount of your regular
>> ham messages in your mailflow which unconditionally touch the content
>> scanners
>
> Why would it? I'd hazard a guess that, on a percentage basis, I run less
> ham though SpamAssassin than spam

than your MTA filters *before* SA just don't work or you have very few 
legit mail at all

95% of any delivery attempt is blocked by a sensible 
DNSBL/DNSWL/PTR/HELO check on the MTA level and never makes it to 
milters at all

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Dave Warren <da...@hireahit.com>.

On 2015-03-15 15:01, Reindl Harald wrote:
> surely, only 5% of incoming spam attempts make it to spamassassin / 
> clamav here, but you need to keep in mind the amount of your regular 
> ham messages in your mailflow which unconditionally touch the content 
> scanners 

Why would it? I'd hazard a guess that, on a percentage basis, I run less 
ham though SpamAssassin than spam.

Obviously comparing the raw numbers will give a different reset of 
results, due to the drastically different number of spam attempts vs ham 
attempts.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 15.03.2015 um 22:19 schrieb Robert Schetterer:
> hypothetical...
>
> spam tagging by spamassassin is "expensive" by design so it should be
> the last step in a long chain of different "antispam" features mostly
> i.e postscreen, clamav-milter, greylisting, rbl filtering, spf dkim
> dmarc checks

surely, only 5% of incoming spam attempts make it to spamassassin / 
clamav here, but you need to keep in mind the amount of your regular ham 
messages in your mailflow which unconditionally touch the content scanners

hence optimizing the ressource usage of the content filter makes in any 
case sense

having clamav-milter before spamass-milter in theory is a good idea 
because clamav is much faster, in the real world the problem is that it 
only rejects a small amount of junk and having spamass-milter before 
clamav reduces the load because it bypasses the next layer - here too: 
your ham mail makes it through both layers anyways

a few months ago after looking at the real mail flow clamav-milter was 
ordered here after spamass-milter since it only rejected 1% of the junk 
making it throgh milters at all while SA rejects 10% of the complete 
mail flow

> Speculation... big spam mails sourced by hacked big mail providers accounts
> are perhaps most difficult to catch ( cause they pass spf dkim etc
> checks before )
>
> So an idea might be switch those providers in another scan chain as
> other mails by milter-manager conditions, you might use multiple
> instances of spamass-milter and/or spamassassin with different setups.
> Multiple other "switches" may integrated with other milters features
>
> For sure such stuff has to be checked against real world examples
> an log analysis. At the end this should give most flexible chances to
> goal multiple scenarios

which makes the setup more complex and difficult to maintain

even if you go that road - performance optimizing inside SA would 
improve *both* chains

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 15.03.2015 um 19:03 schrieb Axb:
> On 03/15/2015 06:49 PM, Robert Schetterer wrote:
>> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
>>> tagging is allowed, rejecting is nice but not a must have
>>
>> if you like reject try working in milter chaining with
>> milter-manager http://milter-manager.sourceforge.net/ ( stats included )
>> this gives you option for complex filter scenarios with div milters
>> for size perhaps test combine with milter
>> milter-size
>> http://www.safe-mbox.com/~rgooch/email/index.html
>> and many other milter stuff i.e
>> https://www.milter.org/milter/98
>> MSH Attach Filter
>>
>> not easy to do but should be extrem powerfull and flexible
>> so on topic you dont need to choose a "prefered milter", just chain and
>> combine all milters you like
> 
> which makes much more sense thatn bending SA to do stuff it's not
> designed to.
> 
> IMO, deciding what chunk of a msg should be scanned should be managed by
> the glue and not by SA.
> 

hypothetical...

spam tagging by spamassassin is "expensive" by design so it should be
the last step in a long chain of different "antispam" features mostly
i.e postscreen, clamav-milter, greylisting, rbl filtering, spf dkim
dmarc checks

Speculation... big spam mails sourced by hacked big mail providers accounts
are perhaps most difficult to catch ( cause they pass spf dkim etc
checks before )

So an idea might be switch those providers in another scan chain as
other mails by milter-manager conditions, you might use multiple
instances of spamass-milter and/or spamassassin with different setups.
Multiple other "switches" may integrated with other milters features

For sure such stuff has to be checked against real world examples
an log analysis. At the end this should give most flexible chances to
goal multiple scenarios.

Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Axb <ax...@gmail.com>.

On 03/15/2015 06:49 PM, Robert Schetterer wrote:
> Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
>> tagging is allowed, rejecting is nice but not a must have
>
> if you like reject try working in milter chaining with
> milter-manager http://milter-manager.sourceforge.net/ ( stats included )
> this gives you option for complex filter scenarios with div milters
> for size perhaps test combine with milter
> milter-size
> http://www.safe-mbox.com/~rgooch/email/index.html
> and many other milter stuff i.e
> https://www.milter.org/milter/98
> MSH Attach Filter
>
> not easy to do but should be extrem powerfull and flexible
> so on topic you dont need to choose a "prefered milter", just chain and
> combine all milters you like

which makes much more sense thatn bending SA to do stuff it's not 
designed to.

IMO, deciding what chunk of a msg should be scanned should be managed by 
the glue and not by SA.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 15.03.2015 um 18:32 schrieb Robert Schetterer:
> tagging is allowed, rejecting is nice but not a must have

if you like reject try working in milter chaining with
milter-manager http://milter-manager.sourceforge.net/ ( stats included )
this gives you option for complex filter scenarios with div milters
for size perhaps test combine with milter
milter-size
http://www.safe-mbox.com/~rgooch/email/index.html
and many other milter stuff i.e
https://www.milter.org/milter/98
MSH Attach Filter

not easy to do but should be extrem powerfull and flexible
so on topic you dont need to choose a "prefered milter", just chain and
combine all milters you like

Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 15.03.2015 um 17:53 schrieb Reindl Harald:
> 
> Am 15.03.2015 um 17:24 schrieb Robert Schetterer:
>> Am 15.03.2015 um 12:05 schrieb Reindl Harald:
>>>
>>> Am 14.03.2015 um 20:17 schrieb Robert Schetterer:
>>>> Am 14.03.2015 um 18:11 schrieb Reindl Harald:
>>>>> nobody but talks about cut content
>>>>>
>>>>> we talk about how to pass only a part to spamassassin instead skip
>>>>> large
>>>>> messages entirely which in many case would be enough to detect a
>>>>> message
>>>>> as spam because the "oversize" are just binary parts
>>>>
>>>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
>>>
>>> you are so terrible wrong
>>
>> my intention was never to agree with you
> 
> so what....
> 
>>> more and more spam messages are coming with a very large image because
>>> spammers know the default 256 KB limit which also affects commercial
>>> products like from Barracuda Networks, that is not a new trend
>>>
>>> there is a reason for "-s 5242880" in our setup while i started with "-s
>>> 786432" a few months ago
>>>
>>
>> as i wrote this may happen at your site, you should not set your
>> experience as ultimate
> 
> but you did that with "Ok, but big spam mails are extrem rare"

and wrote "i" wouldnt invest time in this, nice if you will do it

> 
>> everyone has his/its own spam, i dont see any rise in large mail spam
>> here
> 
> that may be true for *your* account but hardly in case of a large
> user-base for all users, you just don't notice the bypassed junk

10000 users dont reported to fix it yet
i dont work in building a perfect world

> 
>> back to topic i would recommend a two stage spam filtering, if you got
>> in trouble with "big spam mail", i.e spamass-milter in front line, then
>> "perhaps" combine sieve filters with size/spam matches etc
> 
> that can't work beause at that stage you already received the message
> instead reject it and so can't discard it without backscattering
> 

tagging is allowed, rejecting is nice but not a must have


Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 15.03.2015 um 17:24 schrieb Robert Schetterer:
> Am 15.03.2015 um 12:05 schrieb Reindl Harald:
>>
>> Am 14.03.2015 um 20:17 schrieb Robert Schetterer:
>>> Am 14.03.2015 um 18:11 schrieb Reindl Harald:
>>>> nobody but talks about cut content
>>>>
>>>> we talk about how to pass only a part to spamassassin instead skip large
>>>> messages entirely which in many case would be enough to detect a message
>>>> as spam because the "oversize" are just binary parts
>>>
>>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
>>
>> you are so terrible wrong
>
> my intention was never to agree with you

so what....

>> more and more spam messages are coming with a very large image because
>> spammers know the default 256 KB limit which also affects commercial
>> products like from Barracuda Networks, that is not a new trend
>>
>> there is a reason for "-s 5242880" in our setup while i started with "-s
>> 786432" a few months ago
>>
>
> as i wrote this may happen at your site, you should not set your
> experience as ultimate

but you did that with "Ok, but big spam mails are extrem rare"

> everyone has his/its own spam, i dont see any rise in large mail spam here

that may be true for *your* account but hardly in case of a large 
user-base for all users, you just don't notice the bypassed junk

> back to topic i would recommend a two stage spam filtering, if you got
> in trouble with "big spam mail", i.e spamass-milter in front line, then
> "perhaps" combine sieve filters with size/spam matches etc

that can't work beause at that stage you already received the message 
instead reject it and so can't discard it without backscattering

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 15.03.2015 um 12:05 schrieb Reindl Harald:
> 
> Am 14.03.2015 um 20:17 schrieb Robert Schetterer:
>> Am 14.03.2015 um 18:11 schrieb Reindl Harald:
>>> nobody but talks about cut content
>>>
>>> we talk about how to pass only a part to spamassassin instead skip large
>>> messages entirely which in many case would be enough to detect a message
>>> as spam because the "oversize" are just binary parts
>>
>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
> 
> you are so terrible wrong

my intention was never to agree with you

> 
> more and more spam messages are coming with a very large image because
> spammers know the default 256 KB limit which also affects commercial
> products like from Barracuda Networks, that is not a new trend
> 
> there is a reason for "-s 5242880" in our setup while i started with "-s
> 786432" a few months ago
> 

as i wrote this may happen at your site, you should not set your
experience as ultimate
everyone has his/its own spam, i dont see any rise in large mail spam here

back to topic i would recommend a two stage spam filtering, if you got
in trouble with "big spam mail", i.e spamass-milter in front line, then
"perhaps" combine sieve filters with size/spam matches etc

Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.

Am 14.03.2015 um 20:17 schrieb Robert Schetterer:
> Am 14.03.2015 um 18:11 schrieb Reindl Harald:
>> nobody but talks about cut content
>>
>> we talk about how to pass only a part to spamassassin instead skip large
>> messages entirely which in many case would be enough to detect a message
>> as spam because the "oversize" are just binary parts
>
> Ok, but big spam mails are extrem rare, i wouldnt invest time in that

you are so terrible wrong

more and more spam messages are coming with a very large image because 
spammers know the default 256 KB limit which also affects commercial 
products like from Barracuda Networks, that is not a new trend

there is a reason for "-s 5242880" in our setup while i started with "-s 
786432" a few months ago

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Sat, 14 Mar 2015 20:45:16 +0100
Robert Schetterer <rs...@sys4.de> wrote:

> In the last ten years i saw a handfull of these, but ok, perhaps
> different at your site.

Mostly they're spams with the payload in a PDF document, a Word
document or an image.  Very occasionally, we see ones where the plain-text
is padded to a couple of megabytes, but those are extremely rare.

We filter mail for quite a lot of people, so we do see even quite rare
events.

Regards,

David.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Nick Edwards <ni...@gmail.com>.

On 3/15/15, Robert Schetterer <rs...@sys4.de> wrote:
> Am 14.03.2015 um 20:22 schrieb David F. Skoll:
>> On Sat, 14 Mar 2015 20:17:27 +0100
>> Robert Schetterer <rs...@sys4.de> wrote:
>>
>>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
>>
>> They are quite rare, but common enough IMO that our customers would be
>> annoyed if we didn't scan them.
>>
>> Regards,
>>
>> David.
>>
>
> In the last ten years i saw a handfull of these, but ok, perhaps
> different at your site.
>
>
>
> Best Regards
> MfG Robert Schetterer
>
> --
> [*] sys4 AG
>
> http://sys4.de, +49 (89) 30 90 46 64
> Franziskanerstraße 15, 81669 München
>
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian Kirstein
>



I think we too would know if large spam were a problem and our users
active email count is fair size, not huge, but not small

mysql> select count(*) from users where active='1';
+----------+
| count(*) |
+----------+
| 3801914 |
+----------+
1 row in set (0.00 sec)

and we have some bitchy customers that make reindl look like a sunday
school kid.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 14.03.2015 um 20:22 schrieb David F. Skoll:
> On Sat, 14 Mar 2015 20:17:27 +0100
> Robert Schetterer <rs...@sys4.de> wrote:
> 
>> Ok, but big spam mails are extrem rare, i wouldnt invest time in that
> 
> They are quite rare, but common enough IMO that our customers would be
> annoyed if we didn't scan them.
> 
> Regards,
> 
> David.
> 

In the last ten years i saw a handfull of these, but ok, perhaps
different at your site.



Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Sat, 14 Mar 2015 20:17:27 +0100
Robert Schetterer <rs...@sys4.de> wrote:

> Ok, but big spam mails are extrem rare, i wouldnt invest time in that

They are quite rare, but common enough IMO that our customers would be
annoyed if we didn't scan them.

Regards,

David.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 14 Mar 2015, at 15:17, Robert Schetterer wrote:

[...]
>>> Am 14.03.2015 um 17:55 schrieb David F. Skoll:
[...]
>>>> I can't answer for Kevin, but what we do is this: For oversize
>>>> messages, we remove non text/* attachments.  If they're still
>>>> oversize, we truncate the text/plain parts.  If they're still
>>>> oversize, we truncate the text/html parts.  We do this very 
>>>> carefully
>>>> with MIME::tools to ensure that SpamAssassin always sees a valid 
>>>> MIME
>>>> message and not (for example) one with a missing boundary.
>>>>
>>>> We use MIMEDefang for SpamAssassin integration, so we can play 
>>>> whatever
>>>> tricks we like with the data that gets passed to SpamAssassin 
>>>> without
>>>> actually messing with the original message.
[...]
> Ok, but big spam mails are extrem rare, i wouldnt invest time in that

Not true in all contexts.

The majority of user-reported uncaught spam messages on a system I 
manage in the past 6 months are ones that have bypassed SA filtering 
because they were oversize. I've actually invested some time in 
mitigating this problem because users want a fix more than they want 
anything else about spam-filtering changed on that system. Despite using 
MD I had not thought of David's approach, instead I have used less 
scalable approaches of enforcing other rules on large mail that suit the 
system in question (e.g. varying hard limits on message size based on 
sender domain.) I now intend to supplement that with selective MIME 
dismemberment ahead of filtering.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 14.03.2015 um 18:11 schrieb Reindl Harald:
> 
> 
> Am 14.03.2015 um 18:01 schrieb Robert Schetterer:
>> Am 14.03.2015 um 17:55 schrieb David F. Skoll:
>>> On Sat, 14 Mar 2015 17:08:50 +0100
>>> Reindl Harald <h....@thelounge.net> wrote:
>>>
>>>> Am 14.03.2015 um 17:00 schrieb Kevin A. McGrail:
>>>>> On 3/14/2015 1:14 AM, David B Funk wrote:
>>>>>> truncating a large message and
>>>>>> only passing the first N-KB to SA. As that involves munging MIME
>>>>>> headers it has to be done inside the milter.
>>>>>>
>>>>> I just truncate the message hard and it generally works better than
>>>>> not scanning.  What do you do to truncate?
>>>
>>>> how do you truncate messages for the scan?
>>>
>>> I can't answer for Kevin, but what we do is this: For oversize
>>> messages, we remove non text/* attachments.  If they're still
>>> oversize, we truncate the text/plain parts.  If they're still
>>> oversize, we truncate the text/html parts.  We do this very carefully
>>> with MIME::tools to ensure that SpamAssassin always sees a valid MIME
>>> message and not (for example) one with a missing boundary.
>>>
>>> We use MIMEDefang for SpamAssassin integration, so we can play whatever
>>> tricks we like with the data that gets passed to SpamAssassin without
>>> actually messing with the original message.
>>>
>> define oversize..., cutting mail content may not allowed in many
>> countries, most legal policy, is reject ( at income smtp level ) or pass
>> tag passed mail is allowed if the reciept accepts this
> 
> nobody but talks about cut content
> 
> we talk about how to pass only a part to spamassassin instead skip large
> messages entirely which in many case would be enough to detect a message
> as spam because the "oversize" are just binary parts
> 

Ok, but big spam mails are extrem rare, i wouldnt invest time in that


Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Reindl Harald <h....@thelounge.net>.


Am 14.03.2015 um 18:01 schrieb Robert Schetterer:
> Am 14.03.2015 um 17:55 schrieb David F. Skoll:
>> On Sat, 14 Mar 2015 17:08:50 +0100
>> Reindl Harald <h....@thelounge.net> wrote:
>>
>>> Am 14.03.2015 um 17:00 schrieb Kevin A. McGrail:
>>>> On 3/14/2015 1:14 AM, David B Funk wrote:
>>>>> truncating a large message and
>>>>> only passing the first N-KB to SA. As that involves munging MIME
>>>>> headers it has to be done inside the milter.
>>>>>
>>>> I just truncate the message hard and it generally works better than
>>>> not scanning.  What do you do to truncate?
>>
>>> how do you truncate messages for the scan?
>>
>> I can't answer for Kevin, but what we do is this: For oversize
>> messages, we remove non text/* attachments.  If they're still
>> oversize, we truncate the text/plain parts.  If they're still
>> oversize, we truncate the text/html parts.  We do this very carefully
>> with MIME::tools to ensure that SpamAssassin always sees a valid MIME
>> message and not (for example) one with a missing boundary.
>>
>> We use MIMEDefang for SpamAssassin integration, so we can play whatever
>> tricks we like with the data that gets passed to SpamAssassin without
>> actually messing with the original message.
>>
> define oversize..., cutting mail content may not allowed in many
> countries, most legal policy, is reject ( at income smtp level ) or pass
> tag passed mail is allowed if the reciept accepts this

nobody but talks about cut content

we talk about how to pass only a part to spamassassin instead skip large 
messages entirely which in many case would be enough to detect a message 
as spam because the "oversize" are just binary parts

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Sat, 14 Mar 2015 18:01:10 +0100
Robert Schetterer <rs...@sys4.de> wrote:

> define oversize...,

It's configurable, obviously.

> cutting mail content may not allowed in many countries,

Ummm... WTF?  We cut what we pass to SpamAssassin.  We don't actually
alter the original message.  That is either accepted, rejected or
quarantined depending on filtering results.

Regards,

David.

Re: Handling very large messages (was Re: Which milter do you prefer?)

Posted by Robert Schetterer <rs...@sys4.de>.

Am 14.03.2015 um 17:55 schrieb David F. Skoll:
> On Sat, 14 Mar 2015 17:08:50 +0100
> Reindl Harald <h....@thelounge.net> wrote:
> 
>> Am 14.03.2015 um 17:00 schrieb Kevin A. McGrail:
>>> On 3/14/2015 1:14 AM, David B Funk wrote:
>>>> truncating a large message and
>>>> only passing the first N-KB to SA. As that involves munging MIME
>>>> headers it has to be done inside the milter.
>>>>
>>> I just truncate the message hard and it generally works better than
>>> not scanning.  What do you do to truncate?
> 
>> how do you truncate messages for the scan?
> 
> I can't answer for Kevin, but what we do is this: For oversize
> messages, we remove non text/* attachments.  If they're still
> oversize, we truncate the text/plain parts.  If they're still
> oversize, we truncate the text/html parts.  We do this very carefully
> with MIME::tools to ensure that SpamAssassin always sees a valid MIME
> message and not (for example) one with a missing boundary.
> 
> We use MIMEDefang for SpamAssassin integration, so we can play whatever
> tricks we like with the data that gets passed to SpamAssassin without
> actually messing with the original message.
> 
> Regards,
> 
> David.
> 

define oversize..., cutting mail content may not allowed in many
countries, most legal policy, is reject ( at income smtp level ) or pass
, tag passed mail is allowed if the reciept accepts this


Best Regards
MfG Robert Schetterer

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein