You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/01 05:30:19 UTC

Re[2]: NOTICE: 3.1.0 rescoring mass-checks

Hello Theo,

Thursday, June 30, 2005, 8:59:47 AM, you wrote:

TVD> On Wed, Jun 29, 2005 at 06:48:02PM -0700, Justin Mason wrote:
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the
>> details.

TVD> Just to note:

TVD> "The --after=1041397200 option tells mass-check to ignore messages older than
TVD> 18 months ago (in this case January 1 2003). This is useful if your corpus has
TVD> older messages intermingled with your newer messages."

TVD> 18 months ago would be Jan 1 2004, not 2003.  We also usually limit to
TVD> 6 months, not 18, but ...

a) For those of us not intimately familiar with the numeric values of
date/time in perl, what --after value would bring us to Jan 1 2005?

b) I am concerned that starting a full rescoring mass-check against a
large corpus will take longer than allowed.  I'll have to abort, and
send in what I have, but "what I have" will be the results generated
for older emails, not newer emails. Would it be appropriate to have
mass-check process emails newest to oldest?

Bob Menschel




Re: NOTICE: 3.1.0 rescoring mass-checks

Posted by Rod Begbie <ro...@gmail.com>.
On 6/30/05, Theo Van Dinter <fe...@apache.org> wrote:
> > a) For those of us not intimately familiar with the numeric values of
> > date/time in perl, what --after value would bring us to Jan 1 2005?
> 
> Hrm.  1041397200 was 1/1/03.  +365 days is 1072933200, which was 1/1/04.
> +366 days is 1104555600, which was 1/1/05. :)

--after "-6 months" works for me.

Rod.

-- 
:: Rod Begbie :: http://groovymother.com/ ::

Re: NOTICE: 3.1.0 rescoring mass-checks

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 08:30:19PM -0700, Robert Menschel wrote:
> a) For those of us not intimately familiar with the numeric values of
> date/time in perl, what --after value would bring us to Jan 1 2005?

Hrm.  1041397200 was 1/1/03.  +365 days is 1072933200, which was 1/1/04.
+366 days is 1104555600, which was 1/1/05. :)

> b) I am concerned that starting a full rescoring mass-check against a
> large corpus will take longer than allowed.  I'll have to abort, and
> send in what I have, but "what I have" will be the results generated

I'd suggest letting it run for a little bit and estimate out how many
messages you can run through in the time allotted.  I'm doing the
same thing.  It's not 100%, but after 15-30 minutes you should be able
to multiply out and determine the # of messages you can run through (I
leave some wiggle room of 1-2 days), then restart the mass-check with
that many messages.

> for older emails, not newer emails. Would it be appropriate to have
> mass-check process emails newest to oldest?

No.  It needs to go in order, oldest to newest for Bayes.

-- 
Randomly Generated Tagline:
"I'm not bad, I'm just drawn that way." - Jessica Rabbit

Re[3]: NOTICE: 3.1.0 rescoring mass-checks

Posted by Robert Menschel <Ro...@Menschel.net>.
Thursday, June 30, 2005, 8:30:19 PM, I wrote:

TVD>> "The --after=1041397200 option tells mass-check to ignore messages older than
TVD>> 18 months ago (in this case January 1 2003). This is useful if your corpus has
TVD>> older messages intermingled with your newer messages."

TVD>> 18 months ago would be Jan 1 2004, not 2003.  We also usually limit to
TVD>> 6 months, not 18, but ...

RM> a) For those of us not intimately familiar with the numeric values of
RM> date/time in perl, what --after value would bring us to Jan 1 2005?

Never mind.  I checked an old mass-check log, and found that the last
2004 email generated this line:
> .  0 ./corpus.ham/h041231.ham.2114159 [rule hits here]
>      time=1104563602,mid=...

So I'm using this 1104563602 value as my "starting" time.

Bob Menschel