You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Kevin A. McGrail" <KM...@PCCC.com> on 2012/08/07 15:37:33 UTC

Script found that is aborting from insufficient ham

I found the script which runs under updatesd via cron on zones 1

  HAM: 135596 (150000 required)
SPAM: 268096 (150000 required)

Insufficient ham corpus to generate scores; aborting.
Exit Status 8 is not zero for do-nightly-resorce-example

Do we want to lower the limit for ham perhaps to 135?  Or I tweak the days Ham is good for to include more ham?

Jari, are you seeing your corpus?

Anyone else seeing missing corpora?

Is this possibly a problem where corpora are not being included?

I checked times on both zone servers and that's not the issue.

Regards,
KAM



Re: Script found that is aborting from insufficient ham

Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:

> On 8/8/2012 10:17 AM, John Hardin wrote:
>>  On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>> > 
>> >  Can you point me out the masscheck page that you are seeing the 
>> >  difference on?
>>
>>  On any masscheck report, it's listed in two places:
>
> Thanks.  Can you confirm the exact url you are visiting for this report.  I 
> want to remove all assumptions from the mix.

I usually start at the latest results for my sandbox:

http://ruleqa.spamassassin.org/?srcpath=jhardin

>From there, click on any specific rule name. Sorry I didn't clearly note 
"masscheck report _for a specific rule_" above, I should have.

>>  I'm not uploading logs, I'm uploading the message corpora for centralized
>>  masschecks.
>
> Are you sure?  Are you uploading other than the logs?

I am absolutely NOT uploading logs. I am uploading mbox-format corpora 
files for the nightly central masscheck system. I only run masschecks 
locally when I am getting ready to check in a non-trivial rule change, and 
I don't keep the results around for very long.

> I show masscheck logs like these because you aren't actually uploading the 
> emails (which is correct, I believe):
>
> -rw-r--r--   1 rsync    rsync     391675 Aug  8 09:15 ham-bb-jhardin.log
> -rw-r--r--   1 rsync    rsync     391679 Aug  7 09:16 ham-bb-jhardin.log~
> -rw-r--r--   1 rsync    rsync       1145 Aug  8 09:17 
> ham-bb-jhardin_fraud.log
> -rw-r--r--   1 rsync    rsync       1145 Aug  7 09:19 
> ham-bb-jhardin_fraud.log~
> -rw-r--r--   1 rsync    rsync     419449 Aug  4 09:07 ham-net-bb-jhardin.log
> -rw-r--r--   1 rsync    rsync     420618 Jul 28 09:06 ham-net-bb-jhardin.log~
> -rw-r--r--   1 rsync    rsync       1220 Aug  4 09:09 
> ham-net-bb-jhardin_fraud.log
> -rw-r--r--   1 rsync    rsync       1220 Jul 28 09:08 
> ham-net-bb-jhardin_fraud.log~
> -rw-r--r--   1 rsync    root     4639820 Oct  1  2009 
> ham-rescore-bb-jhardin.log
> -rw-r--r--   1 rsync    rsync     222982 Aug  8 09:15 spam-bb-jhardin.log
> -rw-r--r--   1 rsync    rsync     226858 Aug  7 09:16 spam-bb-jhardin.log~
> -rw-r--r--   1 rsync    rsync      67181 Aug  8 09:17 
> spam-bb-jhardin_fraud.log
> -rw-r--r--   1 rsync    rsync      67181 Aug  7 09:19 
> spam-bb-jhardin_fraud.log~
> -rw-r--r--   1 rsync    rsync     226058 Aug  4 09:07 spam-net-bb-jhardin.log
> -rw-r--r--   1 rsync    rsync     232278 Jul 28 09:06 
> spam-net-bb-jhardin.log~
> -rw-r--r--   1 rsync    rsync      37934 Aug  4 09:09 
> spam-net-bb-jhardin_fraud.log
> -rw-r--r--   1 rsync    rsync      25983 Jul 28 09:08 
> spam-net-bb-jhardin_fraud.log~
> -rw-r--r--   1 rsync    root     2491637 Oct  1  2009 
> spam-rescore-bb-jhardin.log

I assume those are the logs resulting from the central nightly masscheck 
that's being run on one of the zones servers.

One thing I am not doing is downloading my result logs and counting the 
messages in them as a check against what I'm uploading. I may start doing 
that...

>> >  Can you remind me of the issue so I can respond intelligently?
>>
>>  When I run masschecks locally against an up-to-date repo, it is not
>>  setting the message boundary RE properly end gets scads of uninitialized
>>  variable errors trying to parse the corpus mailbox files. Last I looked, I
>>  added some warn() output and it was setting the default RE properly but
>>  then appeared to be resetting it later somewhere.
> 
> Sorry about that.  I've reopened the bug.  I believe I thought that was 
> resolved by the conf changes Mark Martinec made so I dropped it.

Thanks.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Usually Microsoft doesn't develop products, we buy products.
                           -- Arno Edelmann, Microsoft product manager
-----------------------------------------------------------------------
  7 days until the 67th anniversary of the end of World War II

Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 10:17 AM, John Hardin wrote:
> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>
>> On 8/7/2012 10:14 AM, John Hardin wrote:
>>>  On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>>>
>>> >  Anyone else seeing missing corpora?
>>> > >  Is this possibly a problem where corpora are not being included?
>>>
>>>  My uploaded corpora are not _missing_, but the number of messages 
>>> reported
>>>  for them in the corpora report on the masscheck results pages are far
>>>  lower than what is being uploaded. I've started rsync back down to 
>>> verify
>>>  and it's apparently not a matter of the upload failing. And I do 
>>> filter by
>>>  date before uploading so it's not a matter of my counting ten thousand
>>>  messages from 2002.
>>
>> Can you point me out the masscheck page that you are seeing the 
>> difference on?
>
> On any masscheck report, it's listed in two places:
>
> (1) in the "set 0, broken down by contributor" you can hover over the 
> hits for spam and ham for every corpus/result set and see the hits and 
> total messages used to calculate the percentage
>
> (2) at the bottom if you expand the "Corpus quality" report and see a 
> more detailed brakdown of the corpus/results contents
>
> Here are my corpora counts at my end (by the number of '^From\s'):
>
> fraud/spam: 5613
> fraud/ham: 0
> public/spam: 7173
> public/ham: 6069
>
> Here are the numbers from the Corpus Quality report:
>
> bb-jhardin_fraud Spam messages    Ham messages
>   TOTAL:              17   (0%)   1   (0%)
>
> bb-jhardin       Spam messages    Ham messages
>   TOTAL:             100   (0%)   235   (0%)
>
> I don't know where the single message in the fraud/ham corpus is from, 
> I may have uploaded a single dummy and forgetten about it.
>
> You can see the other corpora are either being counted/parsed 
> incorrectly or are being filtered somehow.
>
> Strangely enough, the count for the public/spam corpus is different
> between the "set 0" count and the "Corpus quality" report: 67 vs. 100.
Thanks.  Can you confirm the exact url you are visiting for this 
report.  I want to remove all assumptions from the mix.

> I'm not uploading logs, I'm uploading the message corpora for 
> centralized masschecks.
Are you sure?  Are you uploading other than the logs?

I show masscheck logs like these because you aren't actually uploading 
the emails (which is correct, I believe):

-rw-r--r--   1 rsync    rsync     391675 Aug  8 09:15 ham-bb-jhardin.log
-rw-r--r--   1 rsync    rsync     391679 Aug  7 09:16 ham-bb-jhardin.log~
-rw-r--r--   1 rsync    rsync       1145 Aug  8 09:17 
ham-bb-jhardin_fraud.log
-rw-r--r--   1 rsync    rsync       1145 Aug  7 09:19 
ham-bb-jhardin_fraud.log~
-rw-r--r--   1 rsync    rsync     419449 Aug  4 09:07 ham-net-bb-jhardin.log
-rw-r--r--   1 rsync    rsync     420618 Jul 28 09:06 
ham-net-bb-jhardin.log~
-rw-r--r--   1 rsync    rsync       1220 Aug  4 09:09 
ham-net-bb-jhardin_fraud.log
-rw-r--r--   1 rsync    rsync       1220 Jul 28 09:08 
ham-net-bb-jhardin_fraud.log~
-rw-r--r--   1 rsync    root     4639820 Oct  1  2009 
ham-rescore-bb-jhardin.log
-rw-r--r--   1 rsync    rsync     222982 Aug  8 09:15 spam-bb-jhardin.log
-rw-r--r--   1 rsync    rsync     226858 Aug  7 09:16 spam-bb-jhardin.log~
-rw-r--r--   1 rsync    rsync      67181 Aug  8 09:17 
spam-bb-jhardin_fraud.log
-rw-r--r--   1 rsync    rsync      67181 Aug  7 09:19 
spam-bb-jhardin_fraud.log~
-rw-r--r--   1 rsync    rsync     226058 Aug  4 09:07 
spam-net-bb-jhardin.log
-rw-r--r--   1 rsync    rsync     232278 Jul 28 09:06 
spam-net-bb-jhardin.log~
-rw-r--r--   1 rsync    rsync      37934 Aug  4 09:09 
spam-net-bb-jhardin_fraud.log
-rw-r--r--   1 rsync    rsync      25983 Jul 28 09:08 
spam-net-bb-jhardin_fraud.log~
-rw-r--r--   1 rsync    root     2491637 Oct  1  2009 
spam-rescore-bb-jhardin.log


>
>> Can you remind me of the issue so I can respond intelligently?
>
> When I run masschecks locally against an up-to-date repo, it is not 
> setting the message boundary RE properly end gets scads of 
> uninitialized variable errors trying to parse the corpus mailbox 
> files. Last I looked, I added some warn() output and it was setting 
> the default RE properly but then appeared to be resetting it later 
> somewhere.
>
Sorry about that.  I've reopened the bug.  I believe I thought that was 
resolved by the conf changes Mark Martinec made so I dropped it.

Regards,
KAM

Re: Script found that is aborting from insufficient ham

Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:

> On 8/7/2012 10:14 AM, John Hardin wrote:
>>  On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>> 
>> >  Anyone else seeing missing corpora?
>> > 
>> >  Is this possibly a problem where corpora are not being included?
>>
>>  My uploaded corpora are not _missing_, but the number of messages reported
>>  for them in the corpora report on the masscheck results pages are far
>>  lower than what is being uploaded. I've started rsync back down to verify
>>  and it's apparently not a matter of the upload failing. And I do filter by
>>  date before uploading so it's not a matter of my counting ten thousand
>>  messages from 2002.
>
> Can you point me out the masscheck page that you are seeing the difference 
> on?

On any masscheck report, it's listed in two places:

(1) in the "set 0, broken down by contributor" you can hover over the hits 
for spam and ham for every corpus/result set and see the hits and total 
messages used to calculate the percentage

(2) at the bottom if you expand the "Corpus quality" report and see a more 
detailed brakdown of the corpus/results contents

Here are my corpora counts at my end (by the number of '^From\s'):

fraud/spam: 5613
fraud/ham: 0
public/spam: 7173
public/ham: 6069

Here are the numbers from the Corpus Quality report:

bb-jhardin_fraud Spam messages    Ham messages
   TOTAL:              17   (0%)   1   (0%)

bb-jhardin       Spam messages    Ham messages
   TOTAL:             100   (0%)   235   (0%)

I don't know where the single message in the fraud/ham corpus is from, I 
may have uploaded a single dummy and forgetten about it.

You can see the other corpora are either being counted/parsed incorrectly 
or are being filtered somehow.

Strangely enough, the count for the public/spam corpus is different
between the "set 0" count and the "Corpus quality" report: 67 vs. 100.

>>  How are the messages being counted?
>
> I'm trying to figure that out.
>
>>  Might this be related somehow to the message boundary RE config issue I
>>  reported to you privately a few months back?
>
> I can't see how since you aren't uploading messages just logs.

I'm not uploading logs, I'm uploading the message corpora for centralized 
masschecks.

> Can you remind me of the issue so I can respond intelligently?

When I run masschecks locally against an up-to-date repo, it is not 
setting the message boundary RE properly end gets scads of uninitialized 
variable errors trying to parse the corpus mailbox files. Last I looked, I 
added some warn() output and it was setting the default RE properly but 
then appeared to be resetting it later somewhere.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   So Microsoft's invented the ASCII equivalent to ugly ink spots that
   appear on your letter when your pen is malfunctioning.
          -- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
  7 days until the 67th anniversary of the end of World War II

Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 10:14 AM, John Hardin wrote:
> On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>
>> Anyone else seeing missing corpora?
>>
>> Is this possibly a problem where corpora are not being included?
>
> My uploaded corpora are not _missing_, but the number of messages 
> reported for them in the corpora report on the masscheck results pages 
> are far lower than what is being uploaded. I've started rsync back 
> down to verify and it's apparently not a matter of the upload failing. 
> And I do filter by date before uploading so it's not a matter of my 
> counting ten thousand messages from 2002.
Can you point me out the masscheck page that you are seeing the 
difference on?
>
> How are the messages being counted?
I'm trying to figure that out.
>
> Might this be related somehow to the message boundary RE config issue 
> I reported to you privately a few months back?

I can't see how since you aren't uploading messages just logs.  Can you 
remind me of the issue so I can respond intelligently?

Re: Script found that is aborting from insufficient ham

Posted by John Hardin <jh...@impsec.org>.
On Tue, 7 Aug 2012, Kevin A. McGrail wrote:

> Anyone else seeing missing corpora?
>
> Is this possibly a problem where corpora are not being included?

My uploaded corpora are not _missing_, but the number of messages reported 
for them in the corpora report on the masscheck results pages are far 
lower than what is being uploaded. I've started rsync back down to verify 
and it's apparently not a matter of the upload failing. And I do filter by 
date before uploading so it's not a matter of my counting ten thousand 
messages from 2002.

How are the messages being counted?

Might this be related somehow to the message boundary RE config issue I 
reported to you privately a few months back?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Trusting in anti-gun laws to keep you from being shot is like
   refusing to wear your seatbelt because you trust traffic laws to
   keep you from being in a car accident.              -- Erin Palette
-----------------------------------------------------------------------
  8 days until the 67th anniversary of the end of World War II

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
07.08.2012 16:37, Kevin A. McGrail kirjoitti:
> Jari, are you seeing your corpus? 

I don't quite understand your question. Yes I can see it locally,

jarif@whirlwind:~/masscheckwork/nightly_mass_check/masses$ wc -l
*-jarif.log
   11827 ham-jarif.log
    1611 spam-jarif.log
   13438 yhteensä

But not in ruleqa.spamassassin.org.

-- 

The man who sets out to carry a cat by its tail learns something that
will always be useful and which never will grow dim or doubtful.
		-- Mark Twain



Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/10/2012 10:23 AM, Kevin Golding wrote:
> Aha, yes - I'd be using 1.7.5 I believe. 
Thanks Kris, great catch.  I've updated the bug to acknowledge the issue.

regards,
kAM

Re: Script found that is aborting from insufficient ham

Posted by Kevin Golding <kp...@caomhin.org>.
On 10 Aug 2012, at 15:21, Kris Deugau wrote:

> Kevin Golding wrote:
>> Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them.  It looks like my problem is line 998:
>> 
>>  if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {
>> 
>> At that point $dir = /usr/home/masscheck/trunk/masses
>> 
>> I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn
> 
> Have you upgraded to SVN 1.7?  The working copy structure uses only one
> .svn directory at the root of the working copy for 1.7, so if you've
> upgraded from < 1.7, your working copy at /usr/home/masscheck/trunk will
> not have a /usr/home/masscheck/trunk/masses/.svn directory any more.

Aha, yes - I'd be using 1.7.5 I believe.

Re: Script found that is aborting from insufficient ham

Posted by Kris Deugau <kd...@vianet.ca>.
Kevin Golding wrote:
> Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them.  It looks like my problem is line 998:
> 
>   if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {
> 
> At that point $dir = /usr/home/masscheck/trunk/masses
> 
> I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn

Have you upgraded to SVN 1.7?  The working copy structure uses only one
.svn directory at the root of the working copy for 1.7, so if you've
upgraded from < 1.7, your working copy at /usr/home/masscheck/trunk will
not have a /usr/home/masscheck/trunk/masses/.svn directory any more.

-kgd

Re: Script found that is aborting from insufficient ham

Posted by Kevin Golding <kp...@caomhin.org>.
On 9 Aug 2012, at 21:37, Kevin Golding wrote:

> On 9 Aug 2012, at 16:26, Kevin A. McGrail wrote:
> 
>> Can you add a few debug statements and see which scenario in get_current_svn_revsion is being used?
> 
> Well I put print statements in every conditional in that sub and nothing came out. So either I did something very wrong or it's weirder than I thought.

Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them.  It looks like my problem is line 998:

  if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {

At that point $dir = /usr/home/masscheck/trunk/masses

I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn and no sign of any svninfo.tmp anywhere.  Weirdly it looks as if the script creates svninfo.tmp inside the test to see if it exists, which means it will logically never get created as far as I can see.

Just an early morning observation.

I may try nuking the svn repo and rebuilding it in a bit.

Kevin

Re: Script found that is aborting from insufficient ham

Posted by Kevin Golding <kp...@caomhin.org>.
On 9 Aug 2012, at 16:26, Kevin A. McGrail wrote:

> On 8/9/2012 7:25 AM, postmaster@caomhin.org wrote:
>>> On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
>>> Kevin, I can see your submitting logs with SVN Revision: unknown That's
>>> why you are missing, I think.
>> Okay, I think I've hurt my brain looking at this and I'll be honest, a bit
>> lost.
> Are you using the masscheck script that rsyncs the code or the one that svn downloads the code?

svn, it's the trunk/masses/rule-qa/corpus-nightly script

> Can you add a few debug statements and see which scenario in get_current_svn_revsion is being used?

Well I put print statements in every conditional in that sub and nothing came out. So either I did something very wrong or it's weirder than I thought.

> My theory is you are running the svn command line and you might have a broken version of svn.  You might need to compile your own.

I'm on FreeBSD - I compiled it from ports a while back. I shall reinstall for another try in the morning, but running svn info by myself does return the expected data.

Kevin

Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 7:25 AM, postmaster@caomhin.org wrote:
>> On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
>> Kevin, I can see your submitting logs with SVN Revision: unknown That's
>> why you are missing, I think.
> Okay, I think I've hurt my brain looking at this and I'll be honest, a bit
> lost.
Are you using the masscheck script that rsyncs the code or the one that 
svn downloads the code?

Can you add a few debug statements and see which scenario in 
get_current_svn_revsion is being used?

My theory is you are running the svn command line and you might have a 
broken version of svn.  You might need to compile your own.

Anyone know if it would it break things to add information to the 
get_current_svn_revision on stdout?

I think we should add code that if revision is unknown, we abort a 
masscheck.  It's not going to get used...

I'll add this to a bugzilla ticket as well.

Regards,
KAM


Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
> My corpus came online today. Strange, if you have not done something. 
> Or does it use my installed SA ? I had problems with that, but 
> re-built SA from cpan... Maybe that's the key... I was in the 
> impression that it downloads/rsyncs from development version of SA and 
> does it's job with it, but maybe I was wrong. 

I'm unsure what is wrong but I can see that there is something revolving 
around the SVN revision.

Kevin, I can see your submitting logs with SVN Revision: unknown That's 
why you are missing, I think.

Digging into this.

KAM

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 23:58, Kevin A. McGrail kirjoitti:
> On 8/7/2012 11:58 AM, Axb wrote:
>> -1 but - it spit out even more high scores.
>>
>> > Or I tweak the days Ham is good for to include more ham?
>>
>> +1 This would probably cause less havoc.
> Upped from 72 to 84 with very little change
>>
>> > Jari, are you seeing your corpus?
>> >
>> > Anyone else seeing missing corpora?
>> > Is this possibly a problem where corpora are not being included?
> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass
> check log contains:
>
> # SVN revision: 1369288
>
> It uses this command:
>
> REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN
> revision" | cut -d" " -f4 | sort -rn | head -1`
>
> Short circuiting the script and looking at the corpus set manually, we
> have:
>
> head corpus/usable-corpus-set1/* | grep "SVN revision" |  cut -d" "
> -f4 | sort -nr | uniq
> 1369288
> 1366614
> 1364050
> 1348336
> 1342864
> 1307740
> 1301890
> 1296586
> 1243018
> 1240465
> 1197902
> 1183598
> 1128571
> 1098073
> 1064983
> 1059274
> 1042117
> 831520
> 814117
> unknown
>
> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM

My corpus came online today. Strange, if you have not done something.

Or does it use my installed SA ? I had problems with that, but re-built
SA from cpan... Maybe that's the key...

I was in the impression that it downloads/rsyncs from development
version of SA and does it's job with it, but maybe I was wrong.



-- 

You will visit the Dung Pits of Glive soon.



Re: Script found that is aborting from insufficient ham

Posted by Axb <ax...@gmail.com>.
On 08/08/2012 11:51 PM, Jari Fredriksson wrote:
> 09.08.2012 00:45, Kevin A. McGrail kirjoitti:
>> On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
>>>>>>
>>> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30
>>> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date:
>>> 20120808T135530Z # Perl version: 5.014002 on
>>> x86_64-linux-gnu-thread-multi This is from my last log, generated
>>> late today/yesterday. Maybe I have had something wrong.
>>
>> No, 1370707 looks good from other people.  Don't second-guess yourself
>> yet!
>
> Yes, but
>
>> Jari, checking your logs, we have:
>>
>> ham-net-jarif.log:# SVN revision: 1364050
>> spam-net-jarif.log:# SVN revision: 1364050
>>
>> So your logs are being ignored because they are not being run with the
>> "right" version of code.
>>
>> I'm working to find out why you have the wrong version of the code...
>>
>> Regards,
>> KAM
>
> 1364050 does not look good?

KAM needs chocolate:

we're all "N'Sync"

http://ruleqa.spamassassin.org/20120808-r1370707-n


Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
1364050 does not look good?

 From what I can tell for the time period you were uploading, it was not 
the right version, no.

Your newer uploads look better though and I think we are back on track.

Regards,
KAM

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
09.08.2012 00:45, Kevin A. McGrail kirjoitti:
> On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
>>> >> 
>> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30
>> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date:
>> 20120808T135530Z # Perl version: 5.014002 on
>> x86_64-linux-gnu-thread-multi This is from my last log, generated
>> late today/yesterday. Maybe I have had something wrong. 
>
> No, 1370707 looks good from other people.  Don't second-guess yourself
> yet!

Yes, but

> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM

1364050 does not look good?


-- 

You will be reincarnated as a toad; and you will be much happier.



Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30 
> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date: 
> 20120808T135530Z # Perl version: 5.014002 on 
> x86_64-linux-gnu-thread-multi This is from my last log, generated late 
> today/yesterday. Maybe I have had something wrong. 

No, 1370707 looks good from other people.  Don't second-guess yourself yet!

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 23:58, Kevin A. McGrail kirjoitti:
> On 8/7/2012 11:58 AM, Axb wrote:
>> -1 but - it spit out even more high scores.
>>
>> > Or I tweak the days Ham is good for to include more ham?
>>
>> +1 This would probably cause less havoc.
> Upped from 72 to 84 with very little change
>>
>> > Jari, are you seeing your corpus?
>> >
>> > Anyone else seeing missing corpora?
>> > Is this possibly a problem where corpora are not being included?
> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass
> check log contains:
>
> # SVN revision: 1369288
>
> It uses this command:
>
> REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN
> revision" | cut -d" " -f4 | sort -rn | head -1`
>
> Short circuiting the script and looking at the corpus set manually, we
> have:
>
> head corpus/usable-corpus-set1/* | grep "SVN revision" |  cut -d" "
> -f4 | sort -nr | uniq
> 1369288
> 1366614
> 1364050
> 1348336
> 1342864
> 1307740
> 1301890
> 1296586
> 1243018
> 1240465
> 1197902
> 1183598
> 1128571
> 1098073
> 1064983
> 1059274
> 1042117
> 831520
> 814117
> unknown
>
> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM

# mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30 +0000
# M:SA version 3.4.0-r1197259
# SVN revision: 1370707
# Date: 20120808T135530Z
# Perl version: 5.014002 on x86_64-linux-gnu-thread-multi

This is from my last log, generated late today/yesterday.

Maybe I have had something wrong.

-- 

AWAKE! FEAR! FIRE! FOES! AWAKE!
	FEAR! FIRE! FOES!
		AWAKE! AWAKE!
		-- J. R. R. Tolkien



Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 6:24 PM, Jari Fredriksson wrote:
> 09.08.2012 00:52, John Hardin kirjoitti:
>> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>>
>>> OK, so here's the issue:
>>>
>>> The script running RIGHT now is checking to see if the uploaded mass
>>> check log contains:
>>>
>>> # SVN revision: 1369288
>> N.B.: It's probably a very good idea to always do a SVN UP and make
>> before running a local masscheck for upload. Jari, are you doing that?
>>
> No, I'm just running the supplied script.
>
> 00 12  * * * test -x bin/auto-mass-check.sh/usr/bin/nice && nice
> bin/auto-mass-check.sh
>
> I have sligtly edited it, but it should not matter, as it is not in the
> automasscheck folder.
>
> The script does not run svn in any way, all it does is rsync stuff from
> apache site.
rsync or svn should be fine.  As long as you have the later code.  I 
think you are on track to be fine and we'll know more after this weekend.

I don't know why but there is a run of mass-check based on the weekend 
and the weekday code.  We won't generate new updates until after a 
weekend check hits the minimums.

I think you had issues, though I think you worked them out already.

I added quite a bit of debug code to the generate-new-rules script and 
it's working so I can tell a lot more about what's going on, who failed 
and why.

I should also be able to pinpoint who isn't working correctly and a bit 
more about why!

Regards,
KAM

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
09.08.2012 00:52, John Hardin kirjoitti:
> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>
>> OK, so here's the issue:
>>
>> The script running RIGHT now is checking to see if the uploaded mass
>> check log contains:
>>
>> # SVN revision: 1369288
>
> N.B.: It's probably a very good idea to always do a SVN UP and make
> before running a local masscheck for upload. Jari, are you doing that?
>

No, I'm just running the supplied script.

00 12  * * * test -x bin/auto-mass-check.sh/usr/bin/nice && nice
bin/auto-mass-check.sh

I have sligtly edited it, but it should not matter, as it is not in the
automasscheck folder.

The script does not run svn in any way, all it does is rsync stuff from
apache site.

> If the revision number is from a SA file/directory, then adding the
> SVN UP before masscheck might well solve the missing corpora problem
> that Jari et. al. are seeing.
>
> Kevin, do you know yet where the scripting is getting the target SVN
> revision from?
>


-- 

You are so boring that when I see you my feet go to sleep.



Re: Script found that is aborting from insufficient ham

Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:

> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass 
> check log contains:
>
> # SVN revision: 1369288

N.B.: It's probably a very good idea to always do a SVN UP and make before 
running a local masscheck for upload. Jari, are you doing that?

If the revision number is from a SA file/directory, then adding the SVN UP 
before masscheck might well solve the missing corpora problem that Jari 
et. al. are seeing.

Kevin, do you know yet where the scripting is getting the target SVN 
revision from?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   News flash: Lowest Common Denominator down 50 points
-----------------------------------------------------------------------
  7 days until the 67th anniversary of the end of World War II

Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 11:58 AM, Axb wrote:
> -1 but - it spit out even more high scores.
>
> > Or I tweak the days Ham is good for to include more ham?
>
> +1 This would probably cause less havoc.
Upped from 72 to 84 with very little change
>
> > Jari, are you seeing your corpus?
> >
> > Anyone else seeing missing corpora?
> > Is this possibly a problem where corpora are not being included?
OK, so here's the issue:

The script running RIGHT now is checking to see if the uploaded mass 
check log contains:

# SVN revision: 1369288

It uses this command:

REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN 
revision" | cut -d" " -f4 | sort -rn | head -1`

Short circuiting the script and looking at the corpus set manually, we have:

head corpus/usable-corpus-set1/* | grep "SVN revision" |  cut -d" " -f4 
| sort -nr | uniq
1369288
1366614
1364050
1348336
1342864
1307740
1301890
1296586
1243018
1240465
1197902
1183598
1128571
1098073
1064983
1059274
1042117
831520
814117
unknown

Jari, checking your logs, we have:

ham-net-jarif.log:# SVN revision: 1364050
spam-net-jarif.log:# SVN revision: 1364050

So your logs are being ignored because they are not being run with the 
"right" version of code.

I'm working to find out why you have the wrong version of the code...

Regards,
KAM

Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 00:30, Kevin A. McGrail kirjoitti:
>
> On 8/7/2012 12:22 PM, Jari Fredriksson wrote:
>> My typical log here (even increaset the rsync verbosity to be sure)
>
> I see your logs but one oddity I noticed is a lack of -net logs. Looks
> like that might have stopped on Jul 21st?

Yes, I have had some system failures on weekends.
>
> -rw-r--r--   1 rsync    rsync    17038423 Aug  7 09:43 ham-jarif.log
> -rw-r--r--   1 rsync    rsync    17916771 Jul 21 11:45 ham-net-jarif.log
> -rw-r--r--   1 rsync    rsync    2746479 Aug  7 09:43 spam-jarif.log
> -rw-r--r--   1 rsync    rsync    3119937 Jul 21 11:45 spam-net-jarif.log
>
>
> Regards,
> KAM
>


-- 

"Well - I don't think anyone would succeed to publish a web sight"

Husse Dec 2 2007





Re: Script found that is aborting from insufficient ham

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 12:22 PM, Jari Fredriksson wrote:
> My typical log here (even increaset the rsync verbosity to be sure)

I see your logs but one oddity I noticed is a lack of -net logs. Looks 
like that might have stopped on Jul 21st?

-rw-r--r--   1 rsync    rsync    17038423 Aug  7 09:43 ham-jarif.log
-rw-r--r--   1 rsync    rsync    17916771 Jul 21 11:45 ham-net-jarif.log
-rw-r--r--   1 rsync    rsync    2746479 Aug  7 09:43 spam-jarif.log
-rw-r--r--   1 rsync    rsync    3119937 Jul 21 11:45 spam-net-jarif.log


Regards,
KAM


Re: Script found that is aborting from insufficient ham

Posted by Jari Fredriksson <ja...@iki.fi>.
07.08.2012 18:58, Axb kirjoitti:
> > Jari, are you seeing your corpus?
> >
> > Anyone else seeing missing corpora?
> > Is this possibly a problem where corpora are not being included?
>
> I'm watching my masscheck logs closely - all there.

My typical log here (even increaset the rsync verbosity to be sure)

Removing duplicates from HAM SPAM ... done.
Removing unwanted HAM mail from corpus
0
Removing Maildir/.Confirmed-HAM/cur/1344318666.M953201P4886V0000000000000806I0000000000B6331C_0.whirlwind,S=14993:2,S ... done
Removing unwanted SPAM mail from corpus
Syncing nightly_mass_check
+ ./mass-check --hamlog=ham-jarif.log --spamlog=spam-jarif.log -j 4 --progress --reuse ham:dir:/home/jarif/Maildir/.Confirmed-HAM spam:dir:/home/jarif/Maildir/.Confirmed-SPAM
status: starting scan stage                              now: 2012-08-07 12.01.56
status: completed scan stage, 13485 messages             now: 2012-08-07 12.01.58
status: starting run stage                               now: 2012-08-07 12.01.58
status:  10% ham: 1185   spam: 164    date: 2012-05-09   now: 2012-08-07 12.04.17
status:  20% ham: 2373   spam: 325    date: 2011-04-22   now: 2012-08-07 12.08.12
status:  30% ham: 3560   spam: 487    date: 2011-06-17   now: 2012-08-07 12.12.36
status:  40% ham: 4748   spam: 648    date: 2011-09-05   now: 2012-08-07 12.16.53
status:  50% ham: 5937   spam: 808    date: 2011-11-01   now: 2012-08-07 12.21.11
status:  60% ham: 7125   spam: 969    date: 2012-01-02   now: 2012-08-07 12.25.46
status:  70% ham: 8313   spam: 1130   date: 2012-03-06   now: 2012-08-07 12.30.11
status:  80% ham: 9501   spam: 1291   date: 2012-05-04   now: 2012-08-07 12.34.39
status:  90% ham: 10689  spam: 1452   date: 2012-07-24   now: 2012-08-07 12.39.01
status: completed run stage                              now: 2012-08-07 12.43.01
+ LOGLIST=' ham-jarif.log spam-jarif.log'
+ set +x
rsync -Pcvz  ham-jarif.log spam-jarif.log jarif@rsync.spamassassin.org::corpus/
This is the SpamAssassin Corpus rsync machine.

Modules that are available:

corpus
nightly mass-check result upload area.  It is password protected.
If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "nightly" username and password.

submit
Score generation mass-check result upload area.  It is password
protected.  If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "score generation" username
and password.  Generally these are only granted after a mass-check
announcement has been made on the spamassassin developer mailing list.

anoncorpus
mass-check result download area, available via anonymous access.

ham-jarif.log

       32768   0%    0.00kB/s    0:00:00  
     2658417  15%    2.19MB/s    0:00:06  
     5541500  32%    2.44MB/s    0:00:04  
     7794435  45%    2.30MB/s    0:00:03  
     9597411  56%    2.16MB/s    0:00:03  
    11696723  68%    2.08MB/s    0:00:02  
    13594276  79%    1.85MB/s    0:00:01  
    15267861  89%    1.75MB/s    0:00:00  
    17038423 100%    1.96MB/s    0:00:08 (xfer#1, to-check=1/2)
spam-jarif.log

        8280   0%    8.09kB/s    0:05:38  
       11592   0%   11.32kB/s    0:04:01  
     2746479 100%    1.73MB/s    0:00:01 (xfer#2, to-check=0/2)

sent 1021990 bytes  received 34822 bytes  72883.59 bytes/sec
total size is 19784902  speedup is 18.72





Re: Script found that is aborting from insufficient ham

Posted by Axb <ax...@gmail.com>.
On 08/07/2012 03:37 PM, Kevin A. McGrail wrote:> I found the script 
which runs under updatesd via cron on zones 1
 >
 >   HAM: 135596 (150000 required)
 > SPAM: 268096 (150000 required)
 >
 > Insufficient ham corpus to generate scores; aborting.
 > Exit Status 8 is not zero for do-nightly-resorce-example
 >
 > Do we want to lower the limit for ham perhaps to 135?

-1 but - it spit out even more high scores.

 > Or I tweak the days Ham is good for to include more ham?

+1 This would probably cause less havoc.

 > Jari, are you seeing your corpus?
 >
 > Anyone else seeing missing corpora?
 > Is this possibly a problem where corpora are not being included?

I'm watching my masscheck logs closely - all there.


Axb