You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Rob Blomquist <ro...@verizon.net> on 2004/08/24 07:08:24 UTC

How to know what RuleSets are working, easily?

I am trying to figure out which rulesets are important to me, and which ones 
aren't.

I am probably up to about 90% of my spam being trapped, but still, some very 
significant ones make it through, so I am trying to tune my rulesets. The 
other thing is that the filtering is causing pauses in my use of KMail. I 
would love to shorten or end the pauses.

I am running SA 2.63 in conjunction with my RDJ. Back before SARE existed, my 
RDJ generic rulesets could nail 100% of my spam without mistaken ham being 
marked.

I'm not letting you guys in on what my rulesets are, as I want to tune them 
myself.

Rob
-- 

Linux Desktop user since 2000,
Home networker since shortly after.

Linux User #183693
http://counter.li.org/

Re: How to know what RuleSets are working, easily?

Posted by "Jack L. Stone" <ja...@sage-american.com>.
At 08:26 AM 9.7.2004 -0500, Bob Apthorpe wrote:
>On Mon, 6 Sep 2004 18:01:40 -0700 Rob Blomquist
<ro...@verizon.net> wrote:
>
>> On Wednesday 25 August 2004 5:46 am, Jack L. Stone wrote:
>> > At 10:31 PM 8.24.2004 -0700, Loren Wilton wrote:
>> > >> >   #!/bin/sh
>> > >> >   DEFFILES="/etc/mail/spamassassin/*.cf"
>> > >> >   GREPSTR="describe"
>> > >> >
>> > >> >   cat $DEFFILES | egrep ^$GREPSTR  \
>> > >> >
>> > >> >      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
>> > >> >      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
>> > >>
>> > >> $ ./spam-check
>> > >> ./spam-check: line 2: : command not found
>> > >> ./spam-check: line 3: : command not found
>> > >> ./spam-check: line 5: : command not found
>> 
>> I just got back to working on these problems, and a fresh mind seems to
have 
>> solved 90% of my problems. They were all due to the leading spaces in each 
>> line, remove them, and the script runs well with one caveat:
>> 
>> grep:  : No such file or directory
>> 
>> I can only figure that is erroring in 2 places, at
/etc/mail/spamassassin/*.cf 
>> which has 8 *.cf files in it. Or when looking for my spambox, which is 
>> located at /home/robbo/.Mail/SpamPile/cur/.
>> 
>> The file as it is now running, or not, is:
>> 
>> #! /bin/bash
>> DEFFILES="/etc/mail/spamassassin/*.cf"
>> GREPSTR="describe"
>> cat $DEFFILES | egrep ^$GREPSTR  \
>>     | awk '{ print "echo `fgrep " $2 "/home/robbo/.Mail/SpamPile/cur/ \
>>     | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
>> #EOF
>
>Why not:
>
>#! /bin/bash
>DEFFILES="/etc/mail/spamassassin/*.cf"
>GREPSTR="describe"
>MAILFOLDER=/home/robbo/.Mail/SpamPile/cur
>egrep "^[ 	]*$GREPSTR" $DEFFILES | \
>    | awk '{ print "echo `fgrep " $2 " $MAILFOLDER/* \ | wc -l` " $2 } ' \
>    | sort | uniq | tail +2 | sh | sort -rn
>#EOF
>
>Notes:
>
>'[ 	]' is '[<space><tab>]' - useful for dealing with leading whitespace.
>If you really need to get rid of leading whitespace, pipe results of the
>egrep through "sed 's/^[ 	]*//'" rather than deleting whitespace from
>the config files.
>
>There's a big difference between /home/robbo/.Mail/SpamPile/cur/ and
>/home/robbo/.Mail/SpamPile/cur/* and that's probably what's tripping you
>up.
>
>Running the code with 'sh -ax script.sh' helps with debugging shell scripts.
>
>hth,
>
>-- Bob
>

I posted this script a couple of weeks ago and as a bourne shell (csh)
script on FBSD with mbox-style mailboxes, it runs an analysis of about
30,000 emails in a few seconds (4-5?). I've never tried it as a bash. I say
mailboxes, but really all of the spams are copies in 3 single spam
collection boxes that I capture using a procmail recipe as they arrive. The
three boxes are yesterday, today (so far) and archive for the month (oh, #4
- an archive for each month). This allows me to watch for how the rule hits
are changing and those that are not hitting at all.

Also, you might consider running the script as one single line if you think
spaces are causing any problem -- shouldn't though.

Best regards,
Jack L. Stone,
Administrator

Sage American
http://www.sage-american.com
jacks@sage-american.com

Re: How to know what RuleSets are working, easily?

Posted by Rob Blomquist <ro...@verizon.net>.
On Tuesday 07 September 2004 6:26 am, Bob Apthorpe wrote:

> > The file as it is now running, or not, is:
> >
> > #! /bin/bash
> > DEFFILES="/etc/mail/spamassassin/*.cf"
> > GREPSTR="describe"
> > cat $DEFFILES | egrep ^$GREPSTR  \
> >
> >     | awk '{ print "echo `fgrep " $2 "/home/robbo/.Mail/SpamPile/cur/ \
> >     | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
> >
> > #EOF
>
> Why not:
>
> #! /bin/bash
> DEFFILES="/etc/mail/spamassassin/*.cf"
> GREPSTR="describe"
> MAILFOLDER=/home/robbo/.Mail/SpamPile/cur
> egrep "^[ 	]*$GREPSTR" $DEFFILES | \
>
>     | awk '{ print "echo `fgrep " $2 " $MAILFOLDER/* \ | wc -l` " $2 } ' \
>     | sort | uniq | tail +2 | sh | sort -rn
>
> #EOF
>
> Notes:
>
> '[ 	]' is '[<space><tab>]' - useful for dealing with leading whitespace.
> If you really need to get rid of leading whitespace, pipe results of the
> egrep through "sed 's/^[ 	]*//'" rather than deleting whitespace from
> the config files.
>
> There's a big difference between /home/robbo/.Mail/SpamPile/cur/ and
> /home/robbo/.Mail/SpamPile/cur/* and that's probably what's tripping you
> up.
>
> Running the code with 'sh -ax script.sh' helps with debugging shell
> scripts.

Yep, it sure did. 

I still don't know what is wrong, as I don't understand awk at all. But the 
spaces in the cat line were screwing bash up. I removed all of them between 
the commads and the pipes, and got it all running. But it would lonly list 
one rule with no hits. Sigh.

I got your spamrulescan.sh running on my machine however, and while it is 
fairly slow, it got the job done, and I was able to parse the rules back to 
the original rule files. That would be a great trick, to have it down to the 
*.cf file that triggered the hit, so everything is numbered by the rule that 
tripped it to make tuning filters easier.

Rob

-- 

Linux Desktop user since 2000,
Home networker since shortly after.

Linux User #183693
http://counter.li.org/

Re: How to know what RuleSets are working, easily?

Posted by Bob Apthorpe <ap...@cynistar.net>.
On Mon, 6 Sep 2004 18:01:40 -0700 Rob Blomquist <ro...@verizon.net> wrote:

> On Wednesday 25 August 2004 5:46 am, Jack L. Stone wrote:
> > At 10:31 PM 8.24.2004 -0700, Loren Wilton wrote:
> > >> >   #!/bin/sh
> > >> >   DEFFILES="/etc/mail/spamassassin/*.cf"
> > >> >   GREPSTR="describe"
> > >> >
> > >> >   cat $DEFFILES | egrep ^$GREPSTR  \
> > >> >
> > >> >      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
> > >> >      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
> > >>
> > >> $ ./spam-check
> > >> ./spam-check: line 2: : command not found
> > >> ./spam-check: line 3: : command not found
> > >> ./spam-check: line 5: : command not found
> 
> I just got back to working on these problems, and a fresh mind seems to have 
> solved 90% of my problems. They were all due to the leading spaces in each 
> line, remove them, and the script runs well with one caveat:
> 
> grep:  : No such file or directory
> 
> I can only figure that is erroring in 2 places, at /etc/mail/spamassassin/*.cf 
> which has 8 *.cf files in it. Or when looking for my spambox, which is 
> located at /home/robbo/.Mail/SpamPile/cur/.
> 
> The file as it is now running, or not, is:
> 
> #! /bin/bash
> DEFFILES="/etc/mail/spamassassin/*.cf"
> GREPSTR="describe"
> cat $DEFFILES | egrep ^$GREPSTR  \
>     | awk '{ print "echo `fgrep " $2 "/home/robbo/.Mail/SpamPile/cur/ \
>     | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
> #EOF

Why not:

#! /bin/bash
DEFFILES="/etc/mail/spamassassin/*.cf"
GREPSTR="describe"
MAILFOLDER=/home/robbo/.Mail/SpamPile/cur
egrep "^[ 	]*$GREPSTR" $DEFFILES | \
    | awk '{ print "echo `fgrep " $2 " $MAILFOLDER/* \ | wc -l` " $2 } ' \
    | sort | uniq | tail +2 | sh | sort -rn
#EOF

Notes:

'[ 	]' is '[<space><tab>]' - useful for dealing with leading whitespace.
If you really need to get rid of leading whitespace, pipe results of the
egrep through "sed 's/^[ 	]*//'" rather than deleting whitespace from
the config files.

There's a big difference between /home/robbo/.Mail/SpamPile/cur/ and
/home/robbo/.Mail/SpamPile/cur/* and that's probably what's tripping you
up.

Running the code with 'sh -ax script.sh' helps with debugging shell scripts.

hth,

-- Bob

Re: How to know what RuleSets are working, easily?

Posted by Rob Blomquist <ro...@verizon.net>.
On Wednesday 25 August 2004 5:46 am, Jack L. Stone wrote:
> At 10:31 PM 8.24.2004 -0700, Loren Wilton wrote:
> >> >   #!/bin/sh
> >> >   DEFFILES="/etc/mail/spamassassin/*.cf"
> >> >   GREPSTR="describe"
> >> >
> >> >   cat $DEFFILES | egrep ^$GREPSTR  \
> >> >
> >> >      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
> >> >      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
> >>
> >> $ ./spam-check
> >> ./spam-check: line 2: : command not found
> >> ./spam-check: line 3: : command not found
> >> ./spam-check: line 5: : command not found

I just got back to working on these problems, and a fresh mind seems to have 
solved 90% of my problems. They were all due to the leading spaces in each 
line, remove them, and the script runs well with one caveat:

grep:  : No such file or directory

I can only figure that is erroring in 2 places, at /etc/mail/spamassassin/*.cf 
which has 8 *.cf files in it. Or when looking for my spambox, which is 
located at /home/robbo/.Mail/SpamPile/cur/.

The file as it is now running, or not, is:

#! /bin/bash
DEFFILES="/etc/mail/spamassassin/*.cf"
GREPSTR="describe"
cat $DEFFILES | egrep ^$GREPSTR  \
    | awk '{ print "echo `fgrep " $2 "/home/robbo/.Mail/SpamPile/cur/ \
    | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
#EOF

A little more help if you please?

-- 

Linux Desktop user since 2000,
Home networker since shortly after.

Linux User #183693
http://counter.li.org/

Re: 'DNA analysis' spots e-mail spam

Posted by Lucas Albers <ad...@cs.montana.edu>.
Wess Bechard said:
> The better news, is the Spam Firewall with insane accuracy.
> http://it.slashdot.org/article.pl?sid=04/08/24/1315216&tid=111
Firewall accuracy seems to high imnsho.

-- 
Luke Computer Science System Administrator
Security Administrator,College of Engineering
Montana State University-Bozeman,Montana



Re: 'DNA analysis' spots e-mail spam

Posted by Wess Bechard <sp...@eliquid.com>.
This DNA Scanner news was posted on Slashdot a few days ago.
http://slashdot.org/article.pl?sid=04/08/22/1256243&tid=111&tid=1&tid=218

The better news, is the Spam Firewall with insane accuracy.
http://it.slashdot.org/article.pl?sid=04/08/24/1315216&tid=111



On Wed, 2004-08-25 at 09:23, Cami wrote:

> http://news.bbc.co.uk/2/hi/technology/3584534.stm

'DNA analysis' spots e-mail spam

Posted by Cami <ca...@mweb.co.za>.
http://news.bbc.co.uk/2/hi/technology/3584534.stm

Re: How to know what RuleSets are working, easily?

Posted by "Jack L. Stone" <ja...@sage-american.com>.
At 10:31 PM 8.24.2004 -0700, Loren Wilton wrote:
>> >   #!/bin/sh
>> >   DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
>> >   GREPSTR="describe"
>> >
>> >   cat $DEFFILES | egrep ^$GREPSTR  \
>> >
>> >      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
>> >      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
>> >
>>
>> $ ./spam-check
>> ./spam-check: line 2: : command not found
>> ./spam-check: line 3: : command not found
>> ./spam-check: line 5: : command not found
>
>Assuming line 1 really is the #! line, it would be complaining about defines
>failing as commands.  This seem unlikely, so I'd guess one of two things: a)
>you need to use some other shell, and b) you ended up with windows \r\n line
>endings rather than \n line endings, and the \r's are being treated as
>commands.
>
>See if maybe you have \r characters in there.  They will do nasty stuff like
>that.
>
>        Loren
>

The #! is definitely line 1 and it is a bourne (sh)ell. I run it under csh,
the default for root in FBSD system. It ought to run under Linux with some
mod of the paths.

You also need to have the other commands available such as "awk", "fgrep",
etc......

Run cmd "which awk" (without quotes) to see if on system for example.

Best regards,
Jack L. Stone,
Administrator

Sage American
http://www.sage-american.com
jacks@sage-american.com

Re: How to know what RuleSets are working, easily?

Posted by Loren Wilton <lw...@earthlink.net>.
> >   #!/bin/sh
> >   DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
> >   GREPSTR="describe"
> >
> >   cat $DEFFILES | egrep ^$GREPSTR  \
> >
> >      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
> >      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
> >
>
> $ ./spam-check
> ./spam-check: line 2: : command not found
> ./spam-check: line 3: : command not found
> ./spam-check: line 5: : command not found

Assuming line 1 really is the #! line, it would be complaining about defines
failing as commands.  This seem unlikely, so I'd guess one of two things: a)
you need to use some other shell, and b) you ended up with windows \r\n line
endings rather than \n line endings, and the \r's are being treated as
commands.

See if maybe you have \r characters in there.  They will do nasty stuff like
that.

        Loren


Re: How to know what RuleSets are working, easily?

Posted by Rob Blomquist <ro...@verizon.net>.
On Tuesday 24 August 2004 6:32 pm, Jack L. Stone wrote:

> Here's the script -- I grabbed it from this list I think and don't know who
> to give credit for it:
>
>   #!/bin/sh
>   DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
>   GREPSTR="describe"
>
>   cat $DEFFILES | egrep ^$GREPSTR  \
>
>      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
>      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn
>
>   #EOF

I have a little problem with this script, and I am not sure what to do, as I 
am a shell script wannabee.

$ ./spam-check
./spam-check: line 2:  : command not found
./spam-check: line 3:  : command not found
./spam-check: line 5:  : command not found
grep:   : No such file or directory
grep:  : No such file or directory
grep:  : No such file or directory
./spam-check: line 9:  : command not found

What's up. I am running Mandrake 10.0, I am trying to run it under the bash 
shell. /path/to/spamboxes is replaced to the spambox I have for KMail: 
"~/.Mail/SpamPile/cur/"

Rob

-- 

Mountlake Terrace, WA
USA

Re: How to know what RuleSets are working, easily?

Posted by Chris <cp...@earthlink.net>.
On Thursday 26 August 2004 09:10 am, Bob Apthorpe wrote:

> I expanded on this a little with the attached script. The above is fine
> for use against mbox files but not so helpful if you use mh folders (I
> like "one message, one file" because there's less to corrupt, and
> searching is a bit easier)
>
> Note that analysing large mh folders will take a while (this naive
> searching is O[mn] where m is the number of messages and n is the number
> of rules - it took 14 hours to search 10000 messages for 4400 rules (44
> million file reads on a dual 800MHz P-III.) A better way to do this is
> to parse just the message headers and record found rules in a hash.
>
> hth,
>
> -- Bob

I see you have maildir as ???.  Is there anyway this could be made to work 
with maildir format?

-- 
Chris
Registered Linux User 283774 http://counter.li.org
7:45pm up 5 days, 21:19, 2 users, load average: 0.82, 0.65, 0.59
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
VMS Beer: Requires minimal user interaction, except for popping the top 
and sipping.  However cans have been known on occasion to explode, or 
contain extremely un-beer-like contents.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Live - From Virgin Radio UK The Doobie Brothers - Long Train Running


Re: How to know what RuleSets are working, easily?

Posted by Bob Apthorpe <ap...@cynistar.net>.
Hi,

On Tue, 24 Aug 2004 20:32:03 -0500 "Jack L. Stone" <ja...@sage-american.com> wrote:

> This script will give you a list of all rules and times hit in ascending
> order. For different analysis, I run it for the month, week & day to see
> the shifts in rules hit, and new ones being hit.
> 
> Here's the script -- I grabbed it from this list I think and don't know who
> to give credit for it:
> 
>   #!/bin/sh
>   DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
>   GREPSTR="describe"
> 
>   cat $DEFFILES | egrep ^$GREPSTR  \
>      | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
>      | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn

I expanded on this a little with the attached script. The above is fine
for use against mbox files but not so helpful if you use mh folders (I
like "one message, one file" because there's less to corrupt, and
searching is a bit easier)

Note that analysing large mh folders will take a while (this naive
searching is O[mn] where m is the number of messages and n is the number
of rules - it took 14 hours to search 10000 messages for 4400 rules (44
million file reads on a dual 800MHz P-III.) A better way to do this is
to parse just the message headers and record found rules in a hash.

hth,

-- Bob

Re: How can something so simple be so hard?

Posted by "Jack L. Stone" <ja...@sage-american.com>.
At 05:47 PM 8.25.2004 -0500, Chris wrote:
>On Tuesday 24 August 2004 09:40 pm, you wrote:
>
>>
>> Hmmm... only change I made was to break the script so it wouldn't wrap.
>> You could remove those last 2 backslashes and it would be the same as
>> mine. But, shouldn't matter, although my indents added a couple of spaces
>> too.
>>
>> Like so - just make sure to make it all one line starting at "cat":
>> #!/bin/sh
>> DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
>> GREPSTR="describe"
>>
>> cat $DEFFILES | egrep ^$GREPSTR | awk '{ print "echo `fgrep " $2 "
>> /path/to/spamboxes.* | wc -l` " $2 } ' | sort | uniq | tail +2 | sh |
>> sort -rn
>>
>
>Jack, I've attached the script, maybe you can see why I'm only getting this 
>now:
>
>[chris@cpollock chris]$ ./rulehits.sh
>0 AM_BODY_PLING
>
>This does with SA 2.63 doesn't it?
>
>-- 
>Chris

Chris:
I've placed a text file at a URL so we don't have wrapping issues.

Look at your script on top and mine at the bottom. One space is missing in
yours that may make a difference.

Here's the file -- Note the arrow about the space missing.
http://www.antennex.com/tmp/script_compare.txt

HTH.....

Best regards,
Jack L. Stone,
Administrator

Sage American
http://www.sage-american.com
jacks@sage-american.com

Re: How to know what RuleSets are working, easily?

Posted by "Jack L. Stone" <ja...@sage-american.com>.
At 09:14 AM 8.24.2004 -0400, Matt Kettler wrote:
>At 10:08 PM 8/23/2004 -0700, Rob Blomquist wrote:
>>I am trying to figure out which rulesets are important to me, and which ones
>>aren't.
>>
>>I am probably up to about 90% of my spam being trapped, but still, some very
>>significant ones make it through, so I am trying to tune my rulesets. The
>>other thing is that the filtering is causing pauses in my use of KMail. I
>>would love to shorten or end the pauses.
>
>Hmm.. does your setup by any chance log your message statuses anywhere (ie 
>/var/log/maillog)?
>
>Really the quickest way to post-delivery evaluate is to use something like 
>this:
>
>         grep RULE_NAME maillog | wc -l
>
>Repeat for each rule and see who's making the most and the fewest hits.
>
>You could probably do the same thing with kmail's mailbox files, although 
>it would be slower.
>
>However, this won't really tell you which are "important" in the sense of 
>which ones made the difference between a FN and a hit. It will just tell 
>you which ones are getting hit the most. Determining which ones made a 
>difference is more-or-less a by-hand process.. I usually look around for 
>low scoring spam, then look at the rule hits of those..
>

This script will give you a list of all rules and times hit in ascending
order. For different analysis, I run it for the month, week & day to see
the shifts in rules hit, and new ones being hit.

Here's the script -- I grabbed it from this list I think and don't know who
to give credit for it:

  #!/bin/sh
  DEFFILES="/usr/local/etc/mail/spamassassin/*.cf"
  GREPSTR="describe"

  cat $DEFFILES | egrep ^$GREPSTR  \
     | awk '{ print "echo `fgrep " $2 " /path/to/spamboxes.* \
     | wc -l` " $2 } ' | sort | uniq | tail +2 | sh | sort -rn

  #EOF

Best regards,
Jack L. Stone,
Administrator

Sage American
http://www.sage-american.com
jacks@sage-american.com

Re: How to know what RuleSets are working, easily?

Posted by Rich Wales <ri...@richw.org>.
Rob Blomquist wrote:

    > I am trying to figure out which rulesets are important
    > to me, and which ones aren't.

One thing that I found helpful in determining what was going on with
mail that wasn't trapped as spam was to modify the format of the
X-Spam-Status header line to include the score for each rule that
was matched.

(This info is, of course, already included in the "content analysis
details" for mail that is identified as spam.)

I did this by putting the following line (a single long line) in my
configuration:

add_header all Status _YESNO_, hits=_HITS_ required=_REQD_ tests=_TESTSSCORES_ autolearn=_AUTOLEARN_ version=_VERSION_

This is the same as the default, except that the default has
tests=_TESTS_ (listing the matched tests but not their scores;
_TESTSSCORES_ includes the scores).

An example of what an X-Spam-Status line looks like with the above
configuration change:

X-Spam-Status: No, hits=1.9 required=5.0 tests=AWL=0.325,HTML_MESSAGE=0.1,
        RCVD_IN_BL_SPAMCOP_NET=1.5 autolearn=no version=2.64

Rich Wales            richw@richw.org            http://www.richw.org

Re: How to know what RuleSets are working, easily?

Posted by Matt Kettler <mk...@comcast.net>.
At 10:08 PM 8/23/2004 -0700, Rob Blomquist wrote:
>I am trying to figure out which rulesets are important to me, and which ones
>aren't.
>
>I am probably up to about 90% of my spam being trapped, but still, some very
>significant ones make it through, so I am trying to tune my rulesets. The
>other thing is that the filtering is causing pauses in my use of KMail. I
>would love to shorten or end the pauses.

Hmm.. does your setup by any chance log your message statuses anywhere (ie 
/var/log/maillog)?

Really the quickest way to post-delivery evaluate is to use something like 
this:

         grep RULE_NAME maillog | wc -l

Repeat for each rule and see who's making the most and the fewest hits.

You could probably do the same thing with kmail's mailbox files, although 
it would be slower.

However, this won't really tell you which are "important" in the sense of 
which ones made the difference between a FN and a hit. It will just tell 
you which ones are getting hit the most. Determining which ones made a 
difference is more-or-less a by-hand process.. I usually look around for 
low scoring spam, then look at the rule hits of those..