You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Axb <ax...@gmail.com> on 2014/02/27 12:15:14 UTC
we need more masscheckers
Guys,
Yesterday one of my masscheck jobs took very, VERY long.
started at 11:45 / finished at 21:04
I was holding 14 days of spam and that made 767998 messages
I've had to reduce to 7 days for this corpus to avoid the risk of
falling out of the time slot.
"weekly masschecks" take even longer due to the network tests.
This is doing 8 jobs on an Intel 910 SSD. Atm read/writes can't get much
faster than this
status: starting scan stage now: 2014-02-26
11:45:25 AM
status: completed scan stage, 767998 messages now: 2014-02-26
11:48:01 AM
status: starting run stage now: 2014-02-26
11:48:01 AM
status: 10% ham: 0 spam: 76800 date: 2014-01-30 now: 2014-02-26
12:12:11 PM
status: 20% ham: 0 spam: 153600 date: 2014-02-08 now: 2014-02-26
12:42:57 PM
status: 30% ham: 0 spam: 230400 date: 2014-02-14 now: 2014-02-26
01:10:11 PM
status: 40% ham: 0 spam: 307200 date: 2014-02-17 now: 2014-02-26
01:27:19 PM
status: 50% ham: 0 spam: 384000 date: 2014-02-20 now: 2014-02-26
01:50:15 PM
status: 60% ham: 0 spam: 460800 date: 2014-02-21 now: 2014-02-26
03:19:46 PM
status: 70% ham: 0 spam: 537600 date: 2014-02-21 now: 2014-02-26
04:58:07 PM
status: 80% ham: 0 spam: 614400 date: 2014-02-23 now: 2014-02-26
06:13:55 PM
status: 90% ham: 0 spam: 691200 date: 2014-02-24 now: 2014-02-26
07:49:50 PM
status: completed run stage now: 2014-02-26
09:04:19 PM
Iirc one could do this in some client server mode to be able to split
the job across several machines.
(I wish Mosix would still be actively developed)
If anybody has time/patience/interest in investigating this or has a
better idea, I'd be very thankfull.
Alex
Re: we need more masscheckers
Posted by Axb <ax...@gmail.com>.
On 02/27/2014 05:15 PM, John Hardin wrote:
> On Thu, 27 Feb 2014, Axb wrote:
>
>> Yesterday one of my masscheck jobs took very, VERY long.
>>
>> Iirc one could do this in some client server mode to be able to split
>> the job across several machines.
>
> You should be able to split your corpora into multiple files and
> allocate those files across multiple boxes and masscheck them in
> parallel - simply treat them as separate corpora with separate result
> sets, each box masschecking one sub-corpus and reporting results when
> completed.
corpora is in Maildirs - not so simple to split unless you go through a
tone of scripting.
I'm just reducing the corpora's size/TTL and call it a day.
thx
Re: we need more masscheckers
Posted by John Hardin <jh...@impsec.org>.
On Thu, 27 Feb 2014, Axb wrote:
> Yesterday one of my masscheck jobs took very, VERY long.
>
> Iirc one could do this in some client server mode to be able to split
> the job across several machines.
You should be able to split your corpora into multiple files and allocate
those files across multiple boxes and masscheck them in parallel - simply
treat them as separate corpora with separate result sets, each box
masschecking one sub-corpus and reporting results when completed.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
14 days until Albert Einstein's 135th Birthday