You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Moskowitz <ad...@menlo.com> on 2010/07/30 16:58:36 UTC

How to run only certain tests?

Background: SpamAssassin version 3.2.5 running on Perl version 5.8.8 on
CentOS release 5.2 (Final) -- all set up for me by my sysadmin. Everything
works fine when using all the defaults. However . . .

I want to use spamassassin's per-user whitelisting as part of some mail
processing I'm doing. I'm dealing with a lot of messages (potentially
over 100,000), but doing it one-at-a-time (and I can't easily change
that). spamassassin takes a long time to load and run (1.5 - 2 seconds
per message), and it's performing over 50 tests per message even though
for this purpose I need only 1 or 2 of those tests.

Can I arrange to load/run only the tests I need? If so, how?

I've read what I believe are the relevant docs but I can't find what
would let me do this.

I can't (and don't want to) modify the system set-up, but I can create
private, custom versions/copies of config files, rules, rules
directories, whatever; I'm even willing to accept that I may have to
manually apply updates to these private files when the system updates
spamassassin. However, I can't figure out what in these private config
files would be used to say "here's my (pared-down) directory of rules"
or "run only these tests" or however this problem can be solved.

Can someone please help?

Thanks,
Adam

Re: How to run only certain tests?

Posted by John Hardin <jh...@impsec.org>.
On Sat, 31 Jul 2010, Martin Gregorie wrote:

> On Fri, 2010-07-30 at 20:40 -0700, John Hardin wrote:
>> On Sat, 31 Jul 2010, RW wrote:
>>> On Fri, 30 Jul 2010 13:06:43 -0700 (PDT)
>>> John Hardin <jh...@impsec.org> wrote:
>>>> On Fri, 30 Jul 2010, Bowie Bailey wrote:
>>>>
>>>>> service spamd start
>>>>> - run your stuff
>>>>> service spamd stop
>>>>
>>>> I don't think the OP wants to mess around with the global system 
>>>> services, so it's not _quite_ that simple...
>>>
>>> Actually it is. You can run spamd as an ordinary user.
>>
>> "service spamd" executes the system-level init script for the global 
>> spamd. It has nothing to do with running spamd as an ordinary user.
>
> So use "sudo service spamd start|stop|status" - you can even embed it in 
> a shell script. No problem. I do exactly that in my SA rule development 
> rig.

The point is the OP wanted to run custom rulesets against his corpa and 
apparently _did not_ have administrative access to the system or 
permission to change the global system config.

I agree with all of the points made about "service spamd", I was just 
pointing out that it wasn't appropriate or relevant to the OP's situation.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Windows Genuine Advantage (WGA) means that now you use your
   computer at the sufferance of Microsoft Corporation. They can
   kill it remotely without your consent at any time for any reason;
   it also shuts down in sympathy when the servers at Microsoft crash.
-----------------------------------------------------------------------
  5 days until the 275th anniversary of John Peter Zenger's acquittal

Re: How to run only certain tests?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2010-07-30 at 20:40 -0700, John Hardin wrote:
> On Sat, 31 Jul 2010, RW wrote:
> 
> > On Fri, 30 Jul 2010 13:06:43 -0700 (PDT)
> > John Hardin <jh...@impsec.org> wrote:
> >
> >> On Fri, 30 Jul 2010, Bowie Bailey wrote:
> >>
> >>> service spamd start
> >>> - run your stuff
> >>> service spamd stop
> >>
> >> I don't think the OP wants to mess around with the global system
> >> services, so it's not _quite_ that simple...
> >
> > Actually it is. You can run spamd as an ordinary user.
> 
> "service spamd" executes the system-level init script for the global 
> spamd. It has nothing to do with running spamd as an ordinary user.
> 
So use "sudo service spamd start|stop|status" - you can even embed it in
a shell script. No problem. I do exactly that in my SA rule development
rig.

Martin



Re: How to run only certain tests?

Posted by RW <rw...@googlemail.com>.
On Fri, 30 Jul 2010 20:40:45 -0700 (PDT)
John Hardin <jh...@impsec.org> wrote:

> On Sat, 31 Jul 2010, RW wrote:
> 
> > On Fri, 30 Jul 2010 13:06:43 -0700 (PDT)
> > John Hardin <jh...@impsec.org> wrote:
> >
> >> On Fri, 30 Jul 2010, Bowie Bailey wrote:
> >>
> >>> service spamd start
> >>> - run your stuff
> >>> service spamd stop
> >>
> >> I don't think the OP wants to mess around with the global system
> >> services, so it's not _quite_ that simple...
> >
> > Actually it is. You can run spamd as an ordinary user.
> 
> "service spamd" executes the system-level init script for the global 
> spamd. It has nothing to do with running spamd as an ordinary user.

Quite, but if all you need to do is use spamc in a script, you
don't need to run spamd at the system-level, you don't even need to run
it as a daemon. Running it as an ordinary user makes it trivial to
incorporate into a wrapper script.

Re: How to run only certain tests?

Posted by John Hardin <jh...@impsec.org>.
On Sat, 31 Jul 2010, RW wrote:

> On Fri, 30 Jul 2010 13:06:43 -0700 (PDT)
> John Hardin <jh...@impsec.org> wrote:
>
>> On Fri, 30 Jul 2010, Bowie Bailey wrote:
>>
>>> service spamd start
>>> - run your stuff
>>> service spamd stop
>>
>> I don't think the OP wants to mess around with the global system
>> services, so it's not _quite_ that simple...
>
> Actually it is. You can run spamd as an ordinary user.

"service spamd" executes the system-level init script for the global 
spamd. It has nothing to do with running spamd as an ordinary user.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control: The theory that a woman found dead in an alley, raped
   and strangled with her panty hose, is somehow morally superior to
   a woman explaining to police how her attacker got that fatal bullet
   wound.                                             -- L. Neil Smith
-----------------------------------------------------------------------
  6 days until the 275th anniversary of John Peter Zenger's acquittal

Re: How to run only certain tests?

Posted by RW <rw...@googlemail.com>.
On Fri, 30 Jul 2010 13:06:43 -0700 (PDT)
John Hardin <jh...@impsec.org> wrote:

> On Fri, 30 Jul 2010, Bowie Bailey wrote:
> 
> > service spamd start
> > - run your stuff
> > service spamd stop
> 
> I don't think the OP want's to mess around with the global system 
> services, so it's not _quite_ that simple...

Actually it is. You can run spamd as an ordinary user.

Re: How to run only certain tests?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 30 Jul 2010, Bowie Bailey wrote:

> service spamd start
> - run your stuff
> service spamd stop

I don't think the OP want's to mess around with the global system 
services, so it's not _quite_ that simple...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   False is the idea of utility that sacrifices a thousand real
   advantages for one imaginary or trifling inconvenience; that would
   take fire from men because it burns, and water because one may drown
   in it; that has no remedy for evils except destruction. The laws
   that forbid the carrying of arms are laws of such a nature. They
   disarm only those who are neither inclined nor determined to commit
   crime.               -- Cesare Beccaria, quoted by Thomas Jefferson
-----------------------------------------------------------------------
  6 days until the 275th anniversary of John Peter Zenger's acquittal

Re: How to run only certain tests?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2010-07-30 at 12:57 -0400, Bowie Bailey wrote:
> On 7/30/2010 12:14 PM, Adam Moskowitz wrote:
> > 3) While I could fire up a non-standard spamd, throw messages at it,
> > then close it down when I'm done, I'd rather not add the complication of
> > managing the start-up/shut-down of a daemon.
> 
Also: if you're not doing your analysis on the main mail 
handling server, install a copy of SA where you're working, configure it
to suit and do as Bowie says to start and stop it. I use a similar setup
to develop rules without interfering with the 'live' SA.

If you have to do your analysis on the main mail server, do the
following to prevent to two copies of spamd from clashing:
  
cd /etc/init.d
cp spamd spamd_nonstd
#
# make changes to spamd_nonstd so it gets configurable values
# from /etc/sysconfig/spamassassin_nonstd rather 
# than /etc/sysconfig/spamassassin_nonstd and edit
# /etc/sysconfig/spamassassin_nonstd to suit.
#
# Now use the command:
#
service spamd_nonstd start|stop|status
#
# to control your private non-standard spamd
# 
> I still say the simplest thing to do -- and the thing that would make
> the most difference is to use spamd.
>
Agreed. Setting it up this way is fairly easy to do (dead simple if you
can run your SA on a different computer) and, once done, is pretty much
a 'done and dusted' thing.

Martin



Re: How to run only certain tests?

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 7/30/2010 12:14 PM, Adam Moskowitz wrote:
> Earlier today, I wrote:
>> I want to use spamassassin's per-user whitelisting as part of some mail
>> processing I'm doing.
>> . . .
>> spamassassin takes a long time to load and run
>> . . .
>> Can I arrange to load/run only the tests I need? If so, how?
> Sorry, I should have made a few things clear:
>
> 1) This is not part of mail delivery; it's separate post-delivery
> processing I'm running on my mail. Think "analysis," but not quite.
>
> 2) Since all this happens infrequently, I don't want to leave a spamd
> running all the time.
>
> 3) While I could fire up a non-standard spamd, throw messages at it,
> then close it down when I'm done, I'd rather not add the complication of
> managing the start-up/shut-down of a daemon.

service spamd start
- run your stuff
service spamd stop

> 4) Yes, I know running spamassassin on each message is inefficient; in
> this case, that's OK -- but if I can reduce the running time from 1.5
> seconds per message to, say, 0.5 seconds, that's worth it to me.

You say you don't care if it's inefficient, but then you say you need to
reduce the scan time for each message.  Which is it?

If you are getting 1.5 seconds runtime out of calling spamassassin for a
message, I would say you are doing very well.  On the other hand, if
that is the reported scantime, there is probably another additional
second or so actually being used to start spamassassin each time.  If
this is the case, you could probably save a substantial amount of time
by using spamc/spamd even though the reported scantimes would remain the
same.

> 5) Yes, I'm sure lots of people could make arguments for better ways to
> do this -- but given this is being fitted into some existing old and
> crufty code, it's the easiest way *for me* in terms of the amount of
> work I'd have to do.

I still say the simplest thing to do -- and the thing that would make
the most difference is to use spamd.  spamc is a drop-in replacement for
spamassassin, so you don't even need to change anything in your existing
code except replacing 'spamassassin' with 'spamc'.  Just make sure you
start the daemon before it runs -- which, as I mentioned above, isn't
really a big deal.

-- 
Bowie

Re: How to run only certain tests?

Posted by Adam Moskowitz <ad...@menlo.com>.
Earlier today, I wrote:
> I want to use spamassassin's per-user whitelisting as part of some mail
> processing I'm doing.
> . . .
> spamassassin takes a long time to load and run
> . . .
> Can I arrange to load/run only the tests I need? If so, how?

Sorry, I should have made a few things clear:

1) This is not part of mail delivery; it's separate post-delivery
processing I'm running on my mail. Think "analysis," but not quite.

2) Since all this happens infrequently, I don't want to leave a spamd
running all the time.

3) While I could fire up a non-standard spamd, throw messages at it,
then close it down when I'm done, I'd rather not add the complication of
managing the start-up/shut-down of a daemon.

4) Yes, I know running spamassassin on each message is inefficient; in
this case, that's OK -- but if I can reduce the running time from 1.5
seconds per message to, say, 0.5 seconds, that's worth it to me.

5) Yes, I'm sure lots of people could make arguments for better ways to
do this -- but given this is being fitted into some existing old and
crufty code, it's the easiest way *for me* in terms of the amount of
work I'd have to do.

Thanks,
Adam

Re: How to run only certain tests?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 30 Jul 2010, Adam Moskowitz wrote:

> I want to use spamassassin's per-user whitelisting as part of some mail 
> processing I'm doing. I'm dealing with a lot of messages (potentially 
> over 100,000), but doing it one-at-a-time (and I can't easily change 
> that). spamassassin takes a long time to load and run (1.5 - 2 seconds 
> per message), and it's performing over 50 tests per message even though 
> for this purpose I need only 1 or 2 of those tests.

How about: run a spamd on a non-default port configured with your rules 
subset, and score your corpa using spamc against that spamd? You'll get 
much better performance than using a foreground spamassassin, and 
potentially you could multithread it.

Using straight spamassassin is a bad idea for performance reasons, as 
you've noticed.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...much of our country's counterterrorism security spending is not
   designed to protect us from the terrorists, but instead to protect
   our public officials from criticism when another attack occurs.
                                                     -- Bruce Schneier
-----------------------------------------------------------------------
  6 days until the 275th anniversary of John Peter Zenger's acquittal

Re: How to run only certain tests?

Posted by Michael Scheidell <mi...@secnap.com>.
On 7/30/10 10:58 AM, Adam Moskowitz wrote:
> Background: SpamAssassin version 3.2.5 running on Perl version 5.8.8 on
> CentOS release 5.2 (Final) -- all set up for me by my sysadmin. Everything
> works fine when using all the defaults. However . . .
>    
this should get your started.  need to write a perl script.


my $spamtest = new Mail::SpamAssassin(
     {
     rules_filename      => "",
     userprefs_filename  => "",
     username            => "",
     debug               => $debug,
     local_tests_only    => 1,
     dont_copy_prefs     => 1,
     ignore_site_cf_files => 1,
     post_config_text    =>
    '
     skip_rbl_checks 1
     use_dcc 0
     use_bayes 0
     bayes_auto_learn 0
     use_razor2  0
     use_auto_whitelist 0
     ',
   }
);

my $mail = $spamtest->parse($msg2, 0);
my $status = $spamtest->check ($mail);

$status->finish();


-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best in Email Security,2010: Network Products Guide
    * King of Spam Filters, SC Magazine 2008


______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

Re: How to run only certain tests?

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 7/30/2010 10:58 AM, Adam Moskowitz wrote:
> Background: SpamAssassin version 3.2.5 running on Perl version 5.8.8 on
> CentOS release 5.2 (Final) -- all set up for me by my sysadmin. Everything
> works fine when using all the defaults. However . . .
>
> I want to use spamassassin's per-user whitelisting as part of some mail
> processing I'm doing. I'm dealing with a lot of messages (potentially
> over 100,000), but doing it one-at-a-time (and I can't easily change
> that). spamassassin takes a long time to load and run (1.5 - 2 seconds
> per message), and it's performing over 50 tests per message even though
> for this purpose I need only 1 or 2 of those tests.
>
> Can I arrange to load/run only the tests I need? If so, how?
>
> I've read what I believe are the relevant docs but I can't find what
> would let me do this.
>
> I can't (and don't want to) modify the system set-up, but I can create
> private, custom versions/copies of config files, rules, rules
> directories, whatever; I'm even willing to accept that I may have to
> manually apply updates to these private files when the system updates
> spamassassin. However, I can't figure out what in these private config
> files would be used to say "here's my (pared-down) directory of rules"
> or "run only these tests" or however this problem can be solved.
>
> Can someone please help?

How are you running SA?  What is sending mail to it?

If you are calling 'spamassassin' for each message, you should switch to
using spamc/spamd.

If you are using spamc/spamd, you would have to have a separate instance
of spamd for each set of rules.  How many spamd processes do you
currently have?  How much memory is on the machine?

My guess is that it would be counter-productive to try to run multiple
rule sets.  Obviously, you can remove any rules that are not needed to
speed things up a bit.  I would suggest that you analyze your memory
usage and figure out the optimal number of child processes to run
without getting into swap.  Then make sure your MTA or whatever is
sending messages to SA is set up to handle that many processes.  If you
truly are limited to processing one message at a time, then you are
probably already close to your performance wall.  SA runs best when it
can multitask processing multiple messages at once.

-- 
Bowie

-- 
Bowie

Re: How to run only certain tests?

Posted by RW <rw...@googlemail.com>.
On Fri, 30 Jul 2010 20:05:04 -0400
Adam Moskowitz <ad...@menlo.com> wrote:

> > > Except that I pretty clearly stated I didn't want to use spamd.
>>
> > But you gave the reason that it's too complicated, and I was
> > pointing out that it's trivial to do. 
> 
> You answered a question I didn't ask, even after I specifically
> pointed out that the solution being suggested didn't meet my
> requirements.
> 
> > I thought you might have enough sense to change your mind.
> 
> Fuck you and the horse you rode in on!
> 
> Who are you to decide whether I'm being sensible or not, especially
> when you don't know all my requirements?
> 
> Just go away and shut the hell up, OK?

Classy.

On his consultancy website at menlo.com this guy claims to have 25 years
of experience in programming and system administration, but regards the
few lines of basic shell script required to stop and start a dedicated
spamd as too complicated. I would suggest that anyone Googling for
menlo.com or Menlo Computing read this thread and dodge the bullet.


Re: How to run only certain tests?

Posted by RW <rw...@googlemail.com>.
On Fri, 30 Jul 2010 14:54:40 -0400
Adam Moskowitz <ad...@menlo.com> wrote:

> In the spirit of sharing what I've learned . . .
> 
> My question boiled down to this:
> > Can I arrange to load/run only the tests I need? If so, how?
> 
> The answer is actually quite simple:
> 
>     1) Create a private rules directory, say, $HOME/sa-rules
> 
>     2) Copy the following files from /usr/share/spamassassin to
>        $HOME/sa-rules:
> 
> 	    10_default_prefs.cf
> 	    50_scores.cf
> 	    languages
> 	    user_prefs.template
> 
>        plus the rule(s) you want to run.
> 
>     3) Invoke spamassassin as follows:
> 
> 	    spamassassin --configpath=$HOME/sa-rules ...

It wouldn't be much harder to use spamd

spamd -d -p 1234 --user $USER --configpath=$HOME/sa-rules

and then use

spamc -p 1234

Re: How to run only certain tests?

Posted by Adam Moskowitz <ad...@menlo.com>.
In the spirit of sharing what I've learned . . .

My question boiled down to this:
> Can I arrange to load/run only the tests I need? If so, how?

The answer is actually quite simple:

    1) Create a private rules directory, say, $HOME/sa-rules

    2) Copy the following files from /usr/share/spamassassin to
       $HOME/sa-rules:

	    10_default_prefs.cf
	    50_scores.cf
	    languages
	    user_prefs.template

       plus the rule(s) you want to run.

    3) Invoke spamassassin as follows:

	    spamassassin --configpath=$HOME/sa-rules ...

Done.