You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@apache.org> on 2005/07/01 01:43:48 UTC

Issues with mass-check ?

I was perusing the ham/spam logs while my mass-check run is going, and I'm
noticing something odd:

. -2 /home/felicity/SA/corpus/ham/hamtrap/2004/10/19/4d6c6d53f2
BAYES_00,HTML_50_60,HTML_MESSAGE,__CT,__CTE,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__ENV_AND_HDR_FROM_MATCH,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__HTML_LINK_IMAGE,__MIME_HTML,__MIME_QP,__MIME_VERSION,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TOCC_EXISTS,__UNUSABLE_MSGID
time=1098161668,bayes=3.5305036195088e-05,scantime=1,format=f

. -2 /home/felicity/SA/corpus/ham/hamtrap/2004/10/19/4d6c6d53f2
BAYES_00,HTML_50_60,HTML_MESSAGE,__CT,__CTE,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__ENV_AND_HDR_FROM_MATCH,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__HTML_LINK_IMAGE,__MIME_HTML,__MIME_QP,__MIME_VERSION,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TOCC_EXISTS,__UNUSABLE_MSGID
learn=ham,time=1098161668,bayes=3.52863946866955e-05,scantime=1,format=f

.  0 /home/felicity/SA/corpus/ham/hamtrap/2004/10/19/7a805c9d6c
BAYES_20,HTML_80_90,HTML_IMAGE_RATIO_04,HTML_LINK_IMAGE_BUG,HTML_MESSAGE,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HTML_LINK_IMAGE,__MIME_HTML,__MIME_VERSION,__MSGID_OK_HEX,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TOCC_EXISTS,__UNUSABLE_MSGID
time=1098162134,bayes=0.0886094279947974,scantime=0,format=f

.  0 /home/felicity/SA/corpus/ham/hamtrap/2004/10/19/7a805c9d6c
BAYES_20,HTML_80_90,HTML_IMAGE_RATIO_04,HTML_LINK_IMAGE_BUG,HTML_MESSAGE,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HTML_LINK_IMAGE,__MIME_HTML,__MIME_VERSION,__MSGID_OK_HEX,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TOCC_EXISTS,__UNUSABLE_MSGID
time=1098162134,bayes=0.0886060515307976,scantime=1,format=f

.  0 /home/felicity/SA/corpus/ham/personal/2004/10/19/cb700e59e1
BAYES_50,RAZOR2_CF_RANGE_51_100,SPF_HELO_PASS,SPF_PASS,__CT,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__FRAUD_DBI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__MIME_VERSION,__NONEMPTY_BODY,__REPTO_OVERQUOTE,__REPTO_QUOTE,__SANE_MSGID,__TOCC_EXISTS,__UNUSABLE_MSGID
time=1098162405,bayes=0.500000021763459,scantime=3,format=f

.  0 /home/felicity/SA/corpus/ham/personal/2004/10/19/cb700e59e1
BAYES_50,RAZOR2_CF_RANGE_51_100,SPF_HELO_PASS,SPF_PASS,__CT,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__FRAUD_DBI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__MIME_VERSION,__NONEMPTY_BODY,__REPTO_OVERQUOTE,__REPTO_QUOTE,__SANE_MSGID,__TOCC_EXISTS,__UNUSABLE_MSGID
time=1098162405,bayes=0.500000021754118,scantime=1,format=f

. -2 /home/felicity/SA/corpus/ham/personal/2004/10/19/16f89658f9
BAYES_00,__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__MIME_VERSION,__MOZILLA_MSGID,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TOCC_EXISTS,__UNUSABLE_MSGID,__USER_AGENT
learn=ham,time=1098167198,bayes=1.66533453693773e-16,scantime=0,format=f

. -2 /home/felicity/SA/corpus/ham/personal/2004/10/19/16f89658f9
BAYES_00,__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__MIME_VERSION,__MOZILLA_MSGID,__MSGID_OK_HOST,__NONEMPTY_BODY,__SANE_MSGID,__TOCC_EXISTS,__UNUSABLE_MSGID,__USER_AGENT
learn=ham,time=1098167198,bayes=5.55111512312578e-17,scantime=0,format=f


First, results are being listed twice (this is just output from tail).
Second, the two results are usually identical except possibly for the learn= bit.
Third, learning seems to be happening more frequently than 35% of the time.

Is anyone else seeing this?

My mass-check commandline comes out to:

/usr/bin/perl -w /home/corpus/SA/Mail-SpamAssassin-3.1.0/masses/mass-check --all -c
/home/corpus/SA/Mail-SpamAssassin-3.1.0/rules -j 2 --progress --bayes --net
-j 4 --restart=4000 --learn=35 --reuse --after=1041397200

-- 
Randomly Generated Tagline:
You will be given a post of trust and responsibility.

Re: Issues with mass-check ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 09:21:44PM -0400, Theo Van Dinter wrote:
> I'll see if I can debug this some more.  Unfortunately it seems like my 1m
> messages is now really only 500k messages. :(

Damn it to the bowels of bloody hell!

PEBKAC on my end.  Sorry to disturb you all.  <loud grumble>

-- 
Randomly Generated Tagline:
If nobody measures up, check your yardstick.

Re: Issues with mass-check ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jun 30, 2005 at 04:56:10PM -0700, Justin Mason wrote:
> Not seeing this.  Have you got a target listed twice maybe?

Hrm!  According to the tmpfile, you're right:

1041607709^@h^@m^@/home/felicity/SA/corpus/ham/buy-me.3322257
1096862483^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/d129e92bc1
1096862483^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/d129e92bc1
1096862624^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/53dfaa00ef
1096862624^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/53dfaa00ef
1096862706^@s^@f^@/home/felicity/SA/corpus/spam/spamtrap/2004/10/04/aae952eefb
1096862706^@s^@f^@/home/felicity/SA/corpus/spam/spamtrap/2004/10/04/aae952eefb
1096862783^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/87eba8f612
1096862783^@s^@f^@/home/felicity/SA/corpus/spam/personal/2004/10/04/87eba8f612
1041607709^@h^@m^@/home/felicity/SA/corpus/ham/buy-me.3322257

However, I can't see why this happened.  The command I use to generate my
target listing lists each entry once and only once.  My nightly/weekly runs
don't do this.  The commandline is too long to recover via /proc or ps, so as
far as I can tell our code had to do this somehow.

I'll see if I can debug this some more.  Unfortunately it seems like my 1m
messages is now really only 500k messages. :(

-- 
Randomly Generated Tagline:
"Integrity is doing the right thing when nobody is watching you."
         - Infonaut on Slashdot