You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/01 05:30:19 UTC
Re[2]: NOTICE: 3.1.0 rescoring mass-checks
Hello Theo,
Thursday, June 30, 2005, 8:59:47 AM, you wrote:
TVD> On Wed, Jun 29, 2005 at 06:48:02PM -0700, Justin Mason wrote:
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the
>> details.
TVD> Just to note:
TVD> "The --after=1041397200 option tells mass-check to ignore messages older than
TVD> 18 months ago (in this case January 1 2003). This is useful if your corpus has
TVD> older messages intermingled with your newer messages."
TVD> 18 months ago would be Jan 1 2004, not 2003. We also usually limit to
TVD> 6 months, not 18, but ...
a) For those of us not intimately familiar with the numeric values of
date/time in perl, what --after value would bring us to Jan 1 2005?
b) I am concerned that starting a full rescoring mass-check against a
large corpus will take longer than allowed. I'll have to abort, and
send in what I have, but "what I have" will be the results generated
for older emails, not newer emails. Would it be appropriate to have
mass-check process emails newest to oldest?
Bob Menschel
Re: NOTICE: 3.1.0 rescoring mass-checks
Posted by Rod Begbie <ro...@gmail.com>.
On 6/30/05, Theo Van Dinter <fe...@apache.org> wrote:
> > a) For those of us not intimately familiar with the numeric values of
> > date/time in perl, what --after value would bring us to Jan 1 2005?
>
> Hrm. 1041397200 was 1/1/03. +365 days is 1072933200, which was 1/1/04.
> +366 days is 1104555600, which was 1/1/05. :)
--after "-6 months" works for me.
Rod.
--
:: Rod Begbie :: http://groovymother.com/ ::
Re: NOTICE: 3.1.0 rescoring mass-checks
Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 08:30:19PM -0700, Robert Menschel wrote:
> a) For those of us not intimately familiar with the numeric values of
> date/time in perl, what --after value would bring us to Jan 1 2005?
Hrm. 1041397200 was 1/1/03. +365 days is 1072933200, which was 1/1/04.
+366 days is 1104555600, which was 1/1/05. :)
> b) I am concerned that starting a full rescoring mass-check against a
> large corpus will take longer than allowed. I'll have to abort, and
> send in what I have, but "what I have" will be the results generated
I'd suggest letting it run for a little bit and estimate out how many
messages you can run through in the time allotted. I'm doing the
same thing. It's not 100%, but after 15-30 minutes you should be able
to multiply out and determine the # of messages you can run through (I
leave some wiggle room of 1-2 days), then restart the mass-check with
that many messages.
> for older emails, not newer emails. Would it be appropriate to have
> mass-check process emails newest to oldest?
No. It needs to go in order, oldest to newest for Bayes.
--
Randomly Generated Tagline:
"I'm not bad, I'm just drawn that way." - Jessica Rabbit
Re[3]: NOTICE: 3.1.0 rescoring mass-checks
Posted by Robert Menschel <Ro...@Menschel.net>.
Thursday, June 30, 2005, 8:30:19 PM, I wrote:
TVD>> "The --after=1041397200 option tells mass-check to ignore messages older than
TVD>> 18 months ago (in this case January 1 2003). This is useful if your corpus has
TVD>> older messages intermingled with your newer messages."
TVD>> 18 months ago would be Jan 1 2004, not 2003. We also usually limit to
TVD>> 6 months, not 18, but ...
RM> a) For those of us not intimately familiar with the numeric values of
RM> date/time in perl, what --after value would bring us to Jan 1 2005?
Never mind. I checked an old mass-check log, and found that the last
2004 email generated this line:
> . 0 ./corpus.ham/h041231.ham.2114159 [rule hits here]
> time=1104563602,mid=...
So I'm using this 1104563602 value as my "starting" time.
Bob Menschel