You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/07/01 01:56:10 UTC

Re: Issues with mass-check ?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Theo Van Dinter writes:
> I was perusing the ham/spam logs while my mass-check run is going, and I'm
> noticing something odd:
> First, results are being listed twice (this is just output from tail).

Not seeing this.  Have you got a target listed twice maybe?

> Second, the two results are usually identical except possibly for the learn= bit.
> Third, learning seems to be happening more frequently than 35% of the time.

nope, about 35% for me.  my line was:

nice ./mass-check --progress --bayes --net -j 4 --restart=400 --learn=35 --reuse --after=1041397200  -f ~/ftp/sa/targets.basic

- --j.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD4DBQFCxIaaMJF5cimLx9ARAoZoAJ9NVKGqOeaMb2Sh/0In97/SsJ5aYACXb+LB
egOAchh6XFmS7AC9eNmXUg==
=KWCF
-----END PGP SIGNATURE-----


Re: Issues with mass-check ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 09:21:44PM -0400, Theo Van Dinter wrote:
> I'll see if I can debug this some more.  Unfortunately it seems like my 1m
> messages is now really only 500k messages. :(

Damn it to the bowels of bloody hell!

PEBKAC on my end.  Sorry to disturb you all.  <loud grumble>

-- 
Randomly Generated Tagline:
If nobody measures up, check your yardstick.

Re: Issues with mass-check ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 04:56:10PM -0700, Justin Mason wrote:
> Not seeing this.  Have you got a target listed twice maybe?

Hrm!  According to the tmpfile, you're right:

1041607709^@h^@m^@/home/felicity/SA/corpus/ham/buy-me.3322257
1096862483^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/d129e92bc1
1096862483^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/d129e92bc1
1096862624^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/53dfaa00ef
1096862624^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/53dfaa00ef
1096862706^@s^@f^@/home/felicity/SA/corpus/spam/spamtrap/2004/10/04/aae952eefb
1096862706^@s^@f^@/home/felicity/SA/corpus/spam/spamtrap/2004/10/04/aae952eefb
1096862783^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/87eba8f612
1096862783^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/87eba8f612
1041607709^@h^@m^@/home/felicity/SA/corpus/ham/buy-me.3322257

However, I can't see why this happened.  The command I use to generate my
target listing lists each entry once and only once.  My nightly/weekly runs
don't do this.  The commandline is too long to recover via /proc or ps, so as
far as I can tell our code had to do this somehow.

I'll see if I can debug this some more.  Unfortunately it seems like my 1m
messages is now really only 500k messages. :(

-- 
Randomly Generated Tagline:
"Integrity is doing the right thing when nobody is watching you."
         - Infonaut on Slashdot