You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Axb <ax...@gmail.com> on 2014/02/27 12:15:14 UTC

we need more masscheckers

Guys,

Yesterday one of my masscheck jobs took very, VERY long.

started at 11:45 / finished at 21:04

I was holding 14 days of spam and that made 767998 messages
I've had to reduce to 7 days for this corpus to avoid the risk of 
falling out of the time slot.

"weekly masschecks" take even longer due to the network tests.

This is doing 8 jobs on an Intel 910 SSD. Atm read/writes can't get much 
faster than this

status: starting scan stage                              now: 2014-02-26 
11:45:25 AM
status: completed scan stage, 767998 messages            now: 2014-02-26 
11:48:01 AM
status: starting run stage                               now: 2014-02-26 
11:48:01 AM
status:  10% ham: 0      spam: 76800  date: 2014-01-30   now: 2014-02-26 
12:12:11 PM
status:  20% ham: 0      spam: 153600 date: 2014-02-08   now: 2014-02-26 
12:42:57 PM
status:  30% ham: 0      spam: 230400 date: 2014-02-14   now: 2014-02-26 
01:10:11 PM
status:  40% ham: 0      spam: 307200 date: 2014-02-17   now: 2014-02-26 
01:27:19 PM
status:  50% ham: 0      spam: 384000 date: 2014-02-20   now: 2014-02-26 
01:50:15 PM
status:  60% ham: 0      spam: 460800 date: 2014-02-21   now: 2014-02-26 
03:19:46 PM
status:  70% ham: 0      spam: 537600 date: 2014-02-21   now: 2014-02-26 
04:58:07 PM
status:  80% ham: 0      spam: 614400 date: 2014-02-23   now: 2014-02-26 
06:13:55 PM
status:  90% ham: 0      spam: 691200 date: 2014-02-24   now: 2014-02-26 
07:49:50 PM
status: completed run stage                              now: 2014-02-26 
09:04:19 PM

Iirc one could do this in some client server mode to be able to split 
the job across several machines.
(I wish Mosix would still be actively developed)

If anybody has time/patience/interest in investigating this or has a 
better idea, I'd be very thankfull.

Alex


Re: we need more masscheckers

Posted by Axb <ax...@gmail.com>.
On 02/27/2014 05:15 PM, John Hardin wrote:
> On Thu, 27 Feb 2014, Axb wrote:
>
>> Yesterday one of my masscheck jobs took very, VERY long.
>>
>> Iirc one could do this in some client server mode to be able to split
>> the job across several machines.
>
> You should be able to split your corpora into multiple files and
> allocate those files across multiple boxes and masscheck them in
> parallel - simply treat them as separate corpora with separate result
> sets, each box masschecking one sub-corpus and reporting results when
> completed.

corpora is in Maildirs - not so simple to split unless you go through a 
tone of scripting.

I'm just reducing the corpora's size/TTL and call it a day.


thx

Re: we need more masscheckers

Posted by John Hardin <jh...@impsec.org>.
On Thu, 27 Feb 2014, Axb wrote:

> Yesterday one of my masscheck jobs took very, VERY long.
>
> Iirc one could do this in some client server mode to be able to split 
> the job across several machines.

You should be able to split your corpora into multiple files and allocate 
those files across multiple boxes and masscheck them in parallel - simply 
treat them as separate corpora with separate result sets, each box 
masschecking one sub-corpus and reporting results when completed.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  14 days until Albert Einstein's 135th Birthday