You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by "Babu.N" <ba...@intoto.com> on 2007/10/11 09:03:44 UTC

can large emails be scanned in a better way ?

Hi,

http://wiki.apache.org/spamassassin/OutOfMemoryProblems

This link suggests that one should skip sending large emails to 
SpamAssassin (for better performance). It states that "Tests show 
that larger messages are overwhelmingly likely to be non-spam, given 
the economics of spamming". If spammers use botnets to pump spam, is 
this statement still valid ?

In case of botnet spamming, spammers may send large emails (as it is 
the network of the botnet which is used, but not the spammer), with 
top most portion of the email containing spam message & rest of the 
email having some bulk to sizeup the email.

Is it not better if SA takes any-size email & attempts scanning on 
only the top-most portion (say initial 500KB) of the email content 
(as it may not make sense for spammers to keep their advertisement in 
later portions of the email) ?

Please comment.


Thanks,
Babu




********************************************************************************
This email message (including any attachments) is for the sole use of the intended recipient(s) 
and may contain confidential, proprietary and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended recipient, 
please immediately notify the sender by reply email and destroy all copies of the original message. 
Thank you.
 
Intoto Inc.

Re: can large emails be scanned in a better way ?

Posted by Mark Martinec <Ma...@ijs.si>.

Babu,

> Digressing slightly from the thread's topic: Today, do we have any
> SpamAssassin plugin which can be used for domain-signature verification ?

Certainly, the DKIM plugin already comes with a package, just needs
to be enabled: loadplugin Mail::SpamAssassin::Plugin::DKIM

It can be used for reliable whitelisting, and for inclusion in rules
to catch phishing on popular domains (e.g. paypal, ebay, amazon),
and to penalize to some degree mail claiming to be from domains
such as yahoo, yahoogroups, and gmail.com but is not really coming
from there.

  Mark

Re: can large emails be scanned in a better way ?

Posted by "Babu.N" <ba...@intoto.com>.

Mark, thanks for the inputs..
Digressing slightly from the thread's topic: Today, do we have any 
SpamAssassin plugin which can be used for domain-signature verification ?


Thanks,
Babu

At 02:06 PM 10/11/2007, Mark Martinec wrote:
>Babu.N wrote:
>
> > In case of botnet spamming, spammers may send large emails (as it is
> > the network of the botnet which is used, but not the spammer), ...
>
>I do see some large spam messages on occasion, not many, but still,
>perhaps it's time to start considering it.
>
> > Is it not better if SA takes any-size email & attempts scanning on
> > only the top-most portion (say initial 500KB) of the email content
> > (as it may not make sense for spammers to keep their advertisement in
> > later portions of the email) ?
>
>I have another problem on my mind, unrelated to your concerns,
>but requires the same kind of solution.
>
>To verify DKIM and DomainKeys -signed messages, one needs to
>have access to a complete message. The value of a verified
>signature goes beyond catching phishing in small messages.
>
>In order to do that, signatures currently need to be checked
>twice - by SpamAssassin for its own purpose, and by another
>program to be able to verify all mail, even large ones.
>This is not particularly efficient, and duplicates admin
>work, needing to handle an additional product.
>
>Another application that would benefit from seeing a complete message
>is a FuzzyOCR and similar plugins, which scan pictures in a mail,
>and picture-type spam seems to be the first one to exceed the
>few-hundred kB scanning limit.
>
>What I would like to see is for a caller of a SpamAssassin library
>to be able to pass to SA an open file handle (or a file descriptor)
>holding a complete original message, perhaps in addition to a current
>way of passing an in-memory copy of a (fraction of-) a mail.
>
>This way the SpamAssassin plugins and tools which are fast
>or need access to a complete message can have access to it
>through a file handle, and the rest can work as usual on a
>(possibly truncated) memory copy.
>
>See also http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5521
>for my concerns about wasteful use of in-memory copies of mail in
>various forms, although the PR is just a tip of an iceberg and I didn't
>go into detail there.
>
>   Mark




********************************************************************************
This email message (including any attachments) is for the sole use of the intended recipient(s) 
and may contain confidential, proprietary and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended recipient, 
please immediately notify the sender by reply email and destroy all copies of the original message. 
Thank you.
 
Intoto Inc.

Re: can large emails be scanned in a better way ?

Posted by Mark Martinec <Ma...@ijs.si>.

Babu.N wrote:

> In case of botnet spamming, spammers may send large emails (as it is
> the network of the botnet which is used, but not the spammer), ...

I do see some large spam messages on occasion, not many, but still,
perhaps it's time to start considering it.

> Is it not better if SA takes any-size email & attempts scanning on
> only the top-most portion (say initial 500KB) of the email content
> (as it may not make sense for spammers to keep their advertisement in
> later portions of the email) ?

I have another problem on my mind, unrelated to your concerns,
but requires the same kind of solution.

To verify DKIM and DomainKeys -signed messages, one needs to
have access to a complete message. The value of a verified
signature goes beyond catching phishing in small messages.

In order to do that, signatures currently need to be checked
twice - by SpamAssassin for its own purpose, and by another
program to be able to verify all mail, even large ones.
This is not particularly efficient, and duplicates admin
work, needing to handle an additional product.

Another application that would benefit from seeing a complete message
is a FuzzyOCR and similar plugins, which scan pictures in a mail,
and picture-type spam seems to be the first one to exceed the
few-hundred kB scanning limit.

What I would like to see is for a caller of a SpamAssassin library
to be able to pass to SA an open file handle (or a file descriptor)
holding a complete original message, perhaps in addition to a current
way of passing an in-memory copy of a (fraction of-) a mail.

This way the SpamAssassin plugins and tools which are fast
or need access to a complete message can have access to it
through a file handle, and the rest can work as usual on a
(possibly truncated) memory copy.

See also http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5521
for my concerns about wasteful use of in-memory copies of mail in
various forms, although the PR is just a tip of an iceberg and I didn't
go into detail there.

  Mark