You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2004/09/21 05:30:34 UTC

Nightly Masscheck question

I've managed to get my nightly mass-check run operational.

I still need to work on my weekly --net enabled check -- haven't had one
of those complete yet, but the daily (local) mass-check runs to
completion successfully.

Just checking ... the mass-check process kicks off at 02:10 am PDT. It
runs for about 7 hours, finishing up this morning at 09:03 am PDT (09:13
if you count the rsync of results back to the server).

Is this a usual/acceptable elapsed execution time?  If everyone else's
mass-check runs in only 2-3 hours, then I'd like to know so I can find
out why mine takes so long.  If everyone else's takes the same ballpark,
then I'm happy with what it does.

Thanks for any feedback.

Bob Menschel




Re: Nightly Masscheck question

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Sep 21, 2004 at 12:12:51AM -0700, Robert Menschel wrote:
> 30k message, 15k ham and 15k spam, -j 2, --restart 500 (I don't bother
> with restarts on my personal mass-checks; do they help?).

Hrm.  Sorta?  ;)   I generally only use restart for net checks due to them
sometime getting "persnickety".  Over time, I've found that to be DCC getting
wedged.  For some reason, restarting the children periodically helped that.
restart is also useful if you find the mass-check children growing really
large, which can happen depending on your input message stream.

> Is Bayes a significant contributor to the nightly results, or would
> things work just as well if I turned it off?  What about the
> auto-whitelist function?

ah...  to start, I'd just leave them off.  we typically only do sets 0 and 1
for the nightly/weekly runs, and the AWL doesn't really help with that imo.

> FYI, I'll be documenting what I've done, and repeating it to make sure it
> works, and then will put this up on the SA Wiki as a how-to for doing
> nightly mass-checks under Cygwin.

great, thanks! :)

-- 
Randomly Generated Tagline:
Just put in another goto, and then it'll be readable.  :-)
              -- Larry Wall in <19...@wall.org>

Re[2]: Nightly Masscheck question

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Theo,

Monday, September 20, 2004, 8:44:56 PM, you wrote:

TVD> FWIW: I think this is a dev question, not a users question.

TVD> On Mon, Sep 20, 2004 at 08:30:34PM -0700, Robert Menschel wrote:
>> I've managed to get my nightly mass-check run operational.

TVD> :)  yea!

>> I still need to work on my weekly --net enabled check -- haven't had one
>> of those complete yet, but the daily (local) mass-check runs to
>> completion successfully.

TVD> FWIW: I've found I need to disable DCC.  It works fine in my normal workload,
TVD> but dies under heavy mass-check load.

I'll try that for this coming weekend.  I'm also trying to get rbldnsd
working to speed up the SURBL tests.

>> Just checking ... the mass-check process kicks off at 02:10 am PDT. It
>> runs for about 7 hours, finishing up this morning at 09:03 am PDT (09:13
>> if you count the rsync of results back to the server).

TVD> So 2:10 PDT is 9:10 GMT (I think that's right), finishing up at
TVD> ~16:00PDT.

Correct.

>> Is this a usual/acceptable elapsed execution time?  If everyone else's
>> mass-check runs in only 2-3 hours, then I'd like to know so I can find
>> out why mine takes so long.  If everyone else's takes the same ballpark,
>> then I'm happy with what it does.

TVD> I think that's fine for a general start time -- I start mine at
TVD> ~5:11 EDT, aka 9:11 GMT.  My run takes ~20 minutes for set0.  24k
TVD> messages.  So 7 hours is a bit, but depends on what type of system
TVD> you're using, how many messages you're processing, etc.

30k message, 15k ham and 15k spam, -j 2, --restart 500 (I don't bother
with restarts on my personal mass-checks; do they help?).

System is a 2.8 Gig P-4 (w/hyper threading, so it acts somewhat like a
dual processor), 1 Gig memory, decent speed disk drive.

This mass-check generally runs stand-alone (nothing else using any
significant resources at that time).

I'm thinking a fair amount of resources may be going into the Bayes
check. svn/spamassassin/masses/spamassassin contains
-rw-------    1 Owner  None  5187584 Sep 20 03:27 bayes_toks.expire2236
-rw-------    1 Owner  None  2668544 Sep 20 09:14 auto-whitelist
-rw-------    1 Owner  None    39984 Sep 20 09:14 bayes_journal
-rw-------    1 Owner  None  5411840 Sep 20 09:14 bayes_toks
-rw-------    1 Owner  None  1344512 Sep 20 09:14 bayes_seen
which tells me it tried to do an expire only 1h10m into the process,
and probably a couple more along the way.

Is Bayes a significant contributor to the nightly results, or would
things work just as well if I turned it off?  What about the
auto-whitelist function?

Thanks.

FYI, I'll be documenting what I've done, and repeating it to make sure it
works, and then will put this up on the SA Wiki as a how-to for doing
nightly mass-checks under Cygwin.

Bob Menschel






-- 
Best regards,
 Robert                            mailto:Robert@Menschel.net



Re: Nightly Masscheck question

Posted by Theo Van Dinter <fe...@kluge.net>.
FWIW: I think this is a dev question, not a users question.

On Mon, Sep 20, 2004 at 08:30:34PM -0700, Robert Menschel wrote:
> I've managed to get my nightly mass-check run operational.

:)  yea!

> I still need to work on my weekly --net enabled check -- haven't had one
> of those complete yet, but the daily (local) mass-check runs to
> completion successfully.

FWIW: I've found I need to disable DCC.  It works fine in my normal workload,
but dies under heavy mass-check load.

> Just checking ... the mass-check process kicks off at 02:10 am PDT. It
> runs for about 7 hours, finishing up this morning at 09:03 am PDT (09:13
> if you count the rsync of results back to the server).

So 2:10 PDT is 9:10 GMT (I think that's right), finishing up at ~16:00PDT.

> Is this a usual/acceptable elapsed execution time?  If everyone else's
> mass-check runs in only 2-3 hours, then I'd like to know so I can find
> out why mine takes so long.  If everyone else's takes the same ballpark,
> then I'm happy with what it does.

I think that's fine for a general start time -- I start mine at ~5:11 EDT, aka
9:11 GMT.  My run takes ~20 minutes for set0.  24k messages.  So 7 hours is a
bit, but depends on what type of system you're using, how many messages you're
processing, etc.

-- 
Randomly Generated Tagline:
Can I yell "movie" in a crowded firehouse?