You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by User for SpamAssassin Mail List <sp...@pcez.com> on 2005/11/30 20:02:07 UTC

What Optional Rules do I really need?

Hello,

We have a mail system that looks at about 30k incoming emails a day. We
have been running SA for about month (ver 3.03). We run this on a
spamass-milter off of sendmail. With the standard rules it has been
running OK but does not stop as much spam as we would like (we do sa learn
as well). The system runs about 1 gig of memory and is pretty fast.

Anyway I just put on " rulesdujour" and got it up and running but what a
big jump in resources.... So what would the common consensus be on what
rules to run to make the biggest dent on incoming spam with a smallest
jump in resources?


Thanks,

Ken Rea

Re: What Optional Rules do I really need?

Posted by jdow <jd...@earthlink.net>.

From: "User for SpamAssassin Mail List" <sp...@pcez.com>
> 
> On Wed, 30 Nov 2005, Matt Kettler wrote:
> 
>> User for SpamAssassin Mail List wrote:
>> > Hello,
>> >
>> > We have a mail system that looks at about 30k incoming emails a day. We
>> > have been running SA for about month (ver 3.03).
>>
>> WARNING: 3.0.3 is subject to a remotely exploitable DoS attack. All an attacker
>> needs to do is send you a bunch of malformed messages.
> 
> Actually it is Debian 3.0.3-2 , so I am assuming that they have taken
> care of the DoS attack problem?
> 
>> Definitely do not use any "large" rule-sets if you don't want to waste at ton of
>> resources. Most especially "BLACKLIST" in RDJ's trusted ruleset.
>>
>> Also, since you're using 3.0.x, don't use antidrug. These rules are built-in on
>> 3.0.0 and higher.
> 
> Well I was looking for the "names" of the rules from the people that
> know... in the RDJ's trusted ruleset. All I can do is an educated guess on
> what might be the best to run it would be far better to tap into the
> experience of the group.

Ken, there is no way we can answer for you very well. You know your
customer base better than we do. If you are an ISP with a very diverse
set of customers you cannot filter as rigidly as I do with a two person
of like interest and our own personal rules "nano-ISP".

If you are in a corporate setting you need to experiment some. I don't
know if you can create a parallel mail processing path with your setup
or not. I've done it here for various levels of experimentation using
.procmailrc files. I keep one path working as it is and use the other
path for exploring the more rigid rule sets. Then I look for score
differences and any improvements in the false alarm rate and missed
detection rate. But I do not have the personal privacy issues you
might face if you examined messages your current setup passed and the
prospective setup failed to make sure they really were spam if you are
an ISP. If you are in a corporate IT environment this still may be a
problem.

If comparing scores and the like is a problem then you may have to
simply inform users of a new, tighter filtering, anti-spam option
that uses a less conservative selection of SARE rule sets. Recommend
that they filter to a spam folder, check the folder briefly, then
discard it themselves. That covers your ass if they miss a critical
message that was miss-marked as spam.

Also consider per person Bayes with individual training. One person's
ham is another person's spam so one single system wide Bayes is, IMAO,
juggling with fulminate of mercury.

In other words, examine the shape of your customer base and their potential
needs. An ISP may have some very strict evangelical whazzits who consider
any mention of Frisbies as evil while you also have a person looking for
pointers on how to train his dog to catch Frisbies. This makes the life
of ISP system mail administrators interesting, I bet. If you are in a
corporate environment then it becomes easier to filter out Frisbie talk
if management decides this has nothing to do with their business and
is adament about preventing personal use of the company computers. So
you'd pick a rule that handles Frisbies (or create one yourself) and
have done with it. But some of the SARE rule sets might be too tight
about your particular product line, say mousetraps that use high
explosives, and filter too much material of business interest.

In all I'd start with a careful assessment of what you NEED and what seems
proper to do given your particular setting. Then I'd read the SARE rule
set descriptions carefully. Finally I'd experiment as much as I could
get away with and offer a selection of filtration levels for those who
use the system. (I bet, with a little creative work and non-colliding
pid file names, you could create three or four spamd invocations that
use different port assignments. I did that with the apcupsd as a proof
of concept. That would work nicely with procmail. It might not work so
nicely with scanners that perform their own daemonization process,
though.)

And all that said - 30,000 messages a day <mumble> that's a milli-ISP
or corporate environment I bet. That's 30 to 100 users at a guess. I
wonder if I even got close. (If you are part of the PCEZ staff 
setting up the ISP's email you have to take the more open approach
with less filtering. Good luck putting it together properly. Per
user Bayes MAY be your best friend here.)

{^_^}

Re: What Optional Rules do I really need?

Posted by Nix <ni...@esperi.org.uk>.

On Fri, 2 Dec 2005, Rob Skedgell announced authoritatively:
> At the moment I have to use a condition in an Exim ACL to exclude 
> HTML.Phishing.* "malware" from being discarded so that it can be 
> filtered an reported.

Indeed.

You can do the same sort of thing if running sendmail and the
clamav-milter, but it's trickier. I use clamav-milter with the patch
below to put the name of the malware into the X-Virus-Infection-Name:
header, turn on --noreject, and then discard with procmail all mails
with that header present without the word `Phishing' in it.

diff -durN 0.87-orig/clamav-milter/clamav-milter.c 0.87/clamav-milter/clamav-milter.c
--- 0.87-orig/clamav-milter/clamav-milter.c	2005-09-15 23:24:41.000000000 +0100
+++ 0.87/clamav-milter/clamav-milter.c	2005-09-17 00:07:28.000000000 +0100
@@ -261,7 +261,7 @@
 static	int	sendtemplate(SMFICTX *ctx, const char *filename, FILE *sendmail, const char *virusname);
 static	int	qfile(struct privdata *privdata, const char *sendmailId, const char *virusname);
 static	int	move(const char *oldfile, const char *newfile);
-static	void	setsubject(SMFICTX *ctx, const char *virusname);
+static	void	setinfected(SMFICTX *ctx, const char *virusname);
 static	int	clamfi_gethostbyname(const char *hostname, struct hostent *hp, char *buf, size_t len);
 static	int	isLocalAddr(in_addr_t addr);
 static	void	clamdIsDown(void);
@@ -747,7 +747,7 @@
 				break;
 			case 'n':	/* don't add X-Virus-Scanned */
 				nflag++;
-				smfilter.xxfi_flags &= ~(SMFIF_ADDHDRS|SMFIF_CHGHDRS);
+				smfilter.xxfi_flags &= ~(SMFIF_CHGHDRS);
 				break;
 			case 'N':	/* Do we reject mail or silently drop it */
 				rejectmail = 0;
@@ -830,26 +830,6 @@
 	}
 	port = argv[optind];
 
-	if(verifyIncomingSocketName(port) < 0) {
-		fprintf(stderr, _("%s: socket-addr (%s) doesn't agree with sendmail.cf\n"), argv[0], port);
-		return EX_CONFIG;
-	}
-	if(strncasecmp(port, "inet:", 5) == 0)
-		if(!lflag) {
-			/*
-			 * Barmy but true. It seems that clamfi_connect will,
-			 * in this case, get the IP address of the machine
-			 * running sendmail, not of the machine sending the
-			 * mail, so the remote end will be a local address so
-			 * we must scan by enabling --local
-			 *
-			 * TODO: this is probably not needed if the remote
-			 * machine is localhost, need to check though
-			 */
-			fprintf(stderr, _("%s: when using inet: connection to sendmail you must enable --local\n"), argv[0]);
-			return EX_USAGE;
-		}
-
 	/*
 	 * Sanity checks on the clamav configuration file
 	 */
@@ -3050,10 +3030,10 @@
 				if(use_syslog)
 					syslog(LOG_DEBUG, "Redirected virus to %s", quarantine);
 				cli_dbgmsg("Redirected virus to %s\n", quarantine);
-				setsubject(ctx, virusname);
+				setinfected(ctx, virusname);
 			}
 		} else if(advisory)
-			setsubject(ctx, virusname);
+			setinfected(ctx, virusname);
 		else if(rejectmail) {
 			if(privdata->discard)
 				rc = SMFIS_DISCARD;
@@ -4240,22 +4220,12 @@
 }
 
 /*
- * Store the name of the virus in the subject of the e-mail
+ * Store the name of the virus in the X-Virus-Infection-Name header
  */
 static void
-setsubject(SMFICTX *ctx, const char *virusname)
+setinfected(SMFICTX *ctx, const char *virusname)
 {
-	struct privdata *privdata = (struct privdata *)smfi_getpriv(ctx);
-	char subject[128];
-
-	if(privdata->subject)
-		smfi_addheader(ctx, "X-Original-Subject", privdata->subject);
-
-	snprintf(subject, sizeof(subject) - 1, _("[Virus] %s"), virusname);
-	if(privdata->subject)
-		smfi_chgheader(ctx, "Subject", 1, subject);
-	else
-		smfi_addheader(ctx, "Subject", subject);
+        smfi_addheader(ctx, "X-Virus-Infection-Name", virusname);
 }
 
 /*


-- 
`Y'know, London's nice at this time of year. If you like your cities
 freezing cold and full of surly gits.' --- David Damerell

Re: What Optional Rules do I really need?

Posted by Rob Skedgell <ro...@nephelococcygia.demon.co.uk>.

On Friday 02 Dec 2005 07:44, Robert Menschel wrote:
> Hello User,
>
> Thursday, December 1, 2005, 4:26:43 PM, you wrote:
>
> UfSML> SARE_FRAUD was suggested but would this be a duplication when
> UfSML> we are running clamd virus scanner on all the mail?
>
> I don't think so.  The fraud rules file is aimed at phishing emails.
> If clamd catches your phishing emails, then yes, it'd be a
> duplication. If clamd doesn't do too good a job on phish, then the
> fraud rules would be worth having.

When ClamAV 0.90 finally comes out it will be possible to disable the 
detection of phishes as malware, so some people may consider SA rule 
sets like SARE_FRAUD a more appropriate detection mechanism than AV 
software.

See <http://www.clamav.net/faq.html#pagestart> (item 13).

At the moment I have to use a condition in an Exim ACL to exclude 
HTML.Phishing.* "malware" from being discarded so that it can be 
filtered an reported.

-- 
Rob Skedgell <ro...@nephelococcygia.demon.co.uk>

Re[3]: What Optional Rules do I really need?

Posted by User for SpamAssassin Mail List <sp...@pcez.com>.


Yes, clamd does a good job on phishing emails.

Thanks,

Ken Rea

On Thu, 1 Dec 2005, Robert Menschel wrote:

> Hello User,
>
> Thursday, December 1, 2005, 4:26:43 PM, you wrote:
>
> UfSML> SARE_FRAUD was suggested but would this be a duplication when
> UfSML> we are running clamd virus scanner on all the mail?
>
> I don't think so.  The fraud rules file is aimed at phishing emails.
> If clamd catches your phishing emails, then yes, it'd be a
> duplication. If clamd doesn't do too good a job on phish, then the
> fraud rules would be worth having.
>
> Bob Menschel
>
>
>

Re[3]: What Optional Rules do I really need?

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello User,

Thursday, December 1, 2005, 4:26:43 PM, you wrote:

UfSML> SARE_FRAUD was suggested but would this be a duplication when
UfSML> we are running clamd virus scanner on all the mail?

I don't think so.  The fraud rules file is aimed at phishing emails.
If clamd catches your phishing emails, then yes, it'd be a
duplication. If clamd doesn't do too good a job on phish, then the
fraud rules would be worth having.

Bob Menschel

Re: What Optional Rules do I really need?

Posted by Kelson <ke...@speed.net>.

User for SpamAssassin Mail List wrote:
> SARE_FRAUD was suggested but would this be a duplication when we are
> running clamd virus scanner on all the mail?

No, it wouldn't.  They're aimed at different targets.

SARE_FRAUD is aimed mainly at advance fee fraud solicitations -- 
variations on the Nigerian or 419 scam, the International Lottery scams, 
etc.

ClamAV's phishing rules are aimed more at, well, phishing.  There may 
still be some overlap with SARE_SPOOF, but as Matthew said, the more 
filters aimed at it, the better.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re[2]: What Optional Rules do I really need?

Posted by User for SpamAssassin Mail List <sp...@pcez.com>.

Thanks Bob,


SARE_FRAUD was suggested but would this be a duplication when we are
running clamd virus scanner on all the mail?

Thanks,

Ken Rea



On Wed, 30 Nov 2005, Robert Menschel wrote:

> Wednesday, November 30, 2005, 11:59:23 AM, Matt wrote:
>
> MK> I'm not well versed in picking the "minimalist" set for a low-resource site, but
> MK> I can at least tell you what I know you should avoid.
>
> MK> In general, the bigger the .cf file, the more resource intensive it will likely
> MK> be. Admittedly this is a wildly inaccurate measure because of non-rule content,
> MK> but it's better than nothing. I tend to be wary of .cf files over 128k, and I'd
> MK> keep the total under 256k.
>
> MK> FWIW, I personally like these SARE rulesets:
>
> MK> 70_sare_adult.cf        (SARE_ADULT)
> MK> 70_sare_evilnum0.cf 	(SARE_EVILNUMBERS0)
> MK> 70_sare_evilnum1.cf     (SARE_EVILNUMBERS1)
> MK> 70_sare_genlsubj0.cf  	(SARE_GENLSUBJ0)
> MK> 70_sare_obfu0.cf 	(SARE_OBFU0)
> MK> 70_sare_random.cf   	(SARE_RANDOM)
> MK> 70_sare_specific.cf   	(SARE_SPECIFIC)
> MK> 70_sare_uri0.cf		(SARE_URI0)
> MK> 99_sare_fraud_post25x.cf (SARE_FRAUD)
>
> In addition, I suggest 70_sare_html0.cf -- all the 70_sare_*0.cf rules
> files that I maintain are the ones which during SARE mass-checks hit
> no ham, and hit significant (by our classification) spam.
>
> Read the documentation in those *0.cf files, and you'll be able to
> determine for yourself whether to also use the *1.cf files. If you're
> tight on resources, stay away from 70_sare_obfu1.cf, though it is a
> very powerful file and useful to many systems.
>
> Bob Menschel
>
>
>

Re[2]: What Optional Rules do I really need?

Posted by Robert Menschel <Ro...@Menschel.net>.

Wednesday, November 30, 2005, 11:59:23 AM, Matt wrote:

MK> I'm not well versed in picking the "minimalist" set for a low-resource site, but
MK> I can at least tell you what I know you should avoid.

MK> In general, the bigger the .cf file, the more resource intensive it will likely
MK> be. Admittedly this is a wildly inaccurate measure because of non-rule content,
MK> but it's better than nothing. I tend to be wary of .cf files over 128k, and I'd
MK> keep the total under 256k.

MK> FWIW, I personally like these SARE rulesets:

MK> 70_sare_adult.cf        (SARE_ADULT)
MK> 70_sare_evilnum0.cf 	(SARE_EVILNUMBERS0)
MK> 70_sare_evilnum1.cf     (SARE_EVILNUMBERS1)
MK> 70_sare_genlsubj0.cf  	(SARE_GENLSUBJ0)
MK> 70_sare_obfu0.cf 	(SARE_OBFU0)
MK> 70_sare_random.cf   	(SARE_RANDOM)
MK> 70_sare_specific.cf   	(SARE_SPECIFIC)
MK> 70_sare_uri0.cf		(SARE_URI0)
MK> 99_sare_fraud_post25x.cf (SARE_FRAUD)

In addition, I suggest 70_sare_html0.cf -- all the 70_sare_*0.cf rules
files that I maintain are the ones which during SARE mass-checks hit
no ham, and hit significant (by our classification) spam.

Read the documentation in those *0.cf files, and you'll be able to
determine for yourself whether to also use the *1.cf files. If you're
tight on resources, stay away from 70_sare_obfu1.cf, though it is a
very powerful file and useful to many systems.

Bob Menschel

Re: What Optional Rules do I really need?

Posted by Duncan Findlay <du...@debian.org>.

On Wed, Nov 30, 2005 at 02:59:23PM -0500, Matt Kettler wrote:
> User for SpamAssassin Mail List wrote:
> > Actually it is Debian 3.0.3-2 , so I am assuming that they have taken
> > care of the DoS attack problem?
> 
> Probably, but I don't know what the Debian guys do. I personally am pretty
> strongly opposed to using distro-variant packages because unless I'm heavily
> entrenched in that distro I never know what's going to be in it compared to a
> "standard" version.

You could always look in the changelog...

From
http://packages.debian.org/changelogs/pool/main/s/spamassassin/spamassassin_3.0.3-2/changelog
or /usr/share/doc/spamassassin/changelog.Debian.gz

spamassassin (3.0.3-2) stable-security; urgency=high

   * Security release to fix potential DoS caused by large headers
     (CAN-2005-1266)

 -- Duncan Findlay <du...@debian.org>  Wed, 8 Jun 2005 01:35:45 -0400 

So yes. 3.0.3-2 contains the fix for the DoS.

-- 
Duncan Findlay

Re: What Optional Rules do I really need?

Posted by Matt Kettler <mk...@evi-inc.com>.

User for SpamAssassin Mail List wrote:
> Actually it is Debian 3.0.3-2 , so I am assuming that they have taken
> care of the DoS attack problem?

Probably, but I don't know what the Debian guys do. I personally am pretty
strongly opposed to using distro-variant packages because unless I'm heavily
entrenched in that distro I never know what's going to be in it compared to a
"standard" version.

 >>Definitely do not use any "large" rule-sets if you don't want to waste at ton of
>>resources. Most especially "BLACKLIST" in RDJ's trusted ruleset.
>>
>>Also, since you're using 3.0.x, don't use antidrug. These rules are built-in on
>>3.0.0 and higher.
> 
> 
> Well I was looking for the "names" of the rules from the people that
> know... in the RDJ's trusted ruleset. All I can do is an educated guess on
> what might be the best to run it would be far better to tap into the
> experience of the group.
> 

True, I was just giving you some negative-advice. The blacklist ruleset is well
known on this group to cause problems with excessive resource consumption.

I'm not well versed in picking the "minimalist" set for a low-resource site, but
I can at least tell you what I know you should avoid.

In general, the bigger the .cf file, the more resource intensive it will likely
be. Admittedly this is a wildly inaccurate measure because of non-rule content,
but it's better than nothing. I tend to be wary of .cf files over 128k, and I'd
keep the total under 256k.

FWIW, I personally like these SARE rulesets:

70_sare_adult.cf        (SARE_ADULT)
70_sare_evilnum0.cf 	(SARE_EVILNUMBERS0)
70_sare_evilnum1.cf     (SARE_EVILNUMBERS1)
70_sare_genlsubj0.cf  	(SARE_GENLSUBJ0)
70_sare_obfu0.cf 	(SARE_OBFU0)
70_sare_random.cf   	(SARE_RANDOM)
70_sare_specific.cf   	(SARE_SPECIFIC)
70_sare_uri0.cf		(SARE_URI0)
99_sare_fraud_post25x.cf (SARE_FRAUD)

Of those, the largest is the specific ruleset.

Re: What Optional Rules do I really need?

Posted by User for SpamAssassin Mail List <sp...@pcez.com>.

On Wed, 30 Nov 2005, Matt Kettler wrote:

> User for SpamAssassin Mail List wrote:
> > Hello,
> >
> > We have a mail system that looks at about 30k incoming emails a day. We
> > have been running SA for about month (ver 3.03).
>
> WARNING: 3.0.3 is subject to a remotely exploitable DoS attack. All an attacker
> needs to do is send you a bunch of malformed messages.

Actually it is Debian 3.0.3-2 , so I am assuming that they have taken
care of the DoS attack problem?

> Definitely do not use any "large" rule-sets if you don't want to waste at ton of
> resources. Most especially "BLACKLIST" in RDJ's trusted ruleset.
>
> Also, since you're using 3.0.x, don't use antidrug. These rules are built-in on
> 3.0.0 and higher.

Well I was looking for the "names" of the rules from the people that
know... in the RDJ's trusted ruleset. All I can do is an educated guess on
what might be the best to run it would be far better to tap into the
experience of the group.

Thanks,

Ken Rea

Re: What Optional Rules do I really need?

Posted by Matt Kettler <mk...@evi-inc.com>.

User for SpamAssassin Mail List wrote:
> Hello,
> 
> We have a mail system that looks at about 30k incoming emails a day. We
> have been running SA for about month (ver 3.03).

WARNING: 3.0.3 is subject to a remotely exploitable DoS attack. All an attacker
needs to do is send you a bunch of malformed messages.

> We run this on a spamass-milter off of sendmail. With the standard rules it has been
> running OK but does not stop as much spam as we would like (we do sa learn
> as well). The system runs about 1 gig of memory and is pretty fast.
> 
> Anyway I just put on " rulesdujour" and got it up and running but what a
> big jump in resources.... So what would the common consensus be on what
> rules to run to make the biggest dent on incoming spam with a smallest
> jump in resources?

Definitely do not use any "large" rulesets if you don't want to waste at ton of
resources. Most especially "BLACKLIST" in RDJ's trusted ruleset.

Also, since you're using 3.0.x, don't use antidrug. These rules are built-in on
3.0.0 and higher.