You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Charles Gregory <cg...@hwcn.org> on 2010/06/12 15:50:39 UTC

More large spam....

I got another 1MB spam today.

I still don't want to kill my system by attempting to scan every large 
mail that comes in.

Has there been any progress on an 'option' to scan only text portions of 
mail past a certain size limit and/or scan only the first X bytes? The 
former is preferable because it avoids any issues with incomplete mail, or 
text sections being last....

- Charles

Re: More large spam....

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sun, 2010-06-13 at 11:35 -0400, Charles Gregory wrote:
> On Sat, 12 Jun 2010, Karsten Bräckelmann wrote:

> > There are just a very few rules "scanning" non-textual parts of a mail.
> > Large-ish binary attachments don't have much of an impact on
> > performance. Large-ish textual attachments potentially do.
> 
> Now THAT is a curious comment. All the usage guidelines I have ever read 
> implied or outright stated that scanning mails over a certain size was a 
> significant degradation to system performance. Am I confusing the 

Well, a large message internally of course needs more memory and
slightly more time for parsing.

However, most RE rules, which account or the bulk of the load, are
operating on headers and rendered textual parts. They won't be run
against images, zip files, etc.


> guidelines for antivirus programs with those for SA? Would it be 'safe' to 
> run SA on messages with larger attachments? Anyone ever tested this?

Mind trying it yourself? If you're using spamc, just save such a message
and feed it to spamc with an appropriately large -s option. Does it take
significantly longer, or is it just about any other spam?

Also, do that test with ham. This is important, since, as you said, you
are merely getting less than one of these as spam. How many hams that
size do you get?


As a general thought -- though I believe I stated this before -- how
many messages are affected anyway? Both ham and spam. How many messages
larger than 500k and, say, less than 1M do you get in total? In percent
of your mail stream? Are you really afraid your system cannot cope with
a hand full of larger mail per week?

Or, to put it in other words: Even if processing such a mail does take
twice or three times as long burning your CPU, at the end of the week,
would you even notice the increased load?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: More large spam....

Posted by Charles Gregory <cg...@hwcn.org>.
On Sat, 12 Jun 2010, Karsten Bräckelmann wrote:
> Please do not hijack a thread. Please do not hit Reply, if you do not
> intend to reply and contribute to that thread. Removing all quoted text
> and changing the Subject does *not* make it a new thread or post.
> (Hint: In-Reply-To and References headers.)

(grumble grumble) Stupid mail programs.... (grumble grumble)
Yeah.... okay. Not so stupid. I'll comply....

Footnote: and I was refraining from commenting on another thread on how 
people 'complain' about features of SA that don't work in ways that match 
*their* style of thinking.... Oh, the irony.... :)

>> Has there been any progress...
> No changes since this has been asked the last time.

(nod) Alright. So far this is still a less than once a week phenomenon, 
for me personally. I just raise it occasionally to put a data point into 
the archives. If my inquiry had shaken lose a bunch of 'me too' comments, 
it might have led somewhere. But it hasn't, so the issue remains on the 
far back burner.... :)

> There are just a very few rules "scanning" non-textual parts of a mail.
> Large-ish binary attachments don't have much of an impact on
> performance. Large-ish textual attachments potentially do.

Now THAT is a curious comment. All the usage guidelines I have ever read 
implied or outright stated that scanning mails over a certain size was a 
significant degradation to system performance. Am I confusing the 
guidelines for antivirus programs with those for SA? Would it be 'safe' to 
run SA on messages with larger attachments? Anyone ever tested this?

- C

Re: More large spam....

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
Please do not hijack a thread. Please do not hit Reply, if you do not
intend to reply and contribute to that thread. Removing all quoted text
and changing the Subject does *not* make it a new thread or post.

(Hint: In-Reply-To and References headers.)


On Sat, 2010-06-12 at 09:50 -0400, Charles Gregory wrote:
> I got another 1MB spam today.
> 
> I still don't want to kill my system by attempting to scan every large 
> mail that comes in.

How many messages between 500k and 1M do you get per day?

> Has there been any progress on an 'option' to scan only text portions of 
> mail past a certain size limit and/or scan only the first X bytes? The 
> former is preferable because it avoids any issues with incomplete mail, or 
> text sections being last....

No changes since this has been asked the last time. There are features
for this in 3.3, used by Amavis. This is not used by spamc.

There are just a very few rules "scanning" non-textual parts of a mail.
Large-ish binary attachments don't have much of an impact on
performance. Large-ish textual attachments potentially do.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}