You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Kevin A. McGrail" <KM...@PCCC.com> on 2012/08/07 15:37:33 UTC
Script found that is aborting from insufficient ham
I found the script which runs under updatesd via cron on zones 1
HAM: 135596 (150000 required)
SPAM: 268096 (150000 required)
Insufficient ham corpus to generate scores; aborting.
Exit Status 8 is not zero for do-nightly-resorce-example
Do we want to lower the limit for ham perhaps to 135? Or I tweak the days Ham is good for to include more ham?
Jari, are you seeing your corpus?
Anyone else seeing missing corpora?
Is this possibly a problem where corpora are not being included?
I checked times on both zone servers and that's not the issue.
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
> On 8/8/2012 10:17 AM, John Hardin wrote:
>> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>> >
>> > Can you point me out the masscheck page that you are seeing the
>> > difference on?
>>
>> On any masscheck report, it's listed in two places:
>
> Thanks. Can you confirm the exact url you are visiting for this report. I
> want to remove all assumptions from the mix.
I usually start at the latest results for my sandbox:
http://ruleqa.spamassassin.org/?srcpath=jhardin
>From there, click on any specific rule name. Sorry I didn't clearly note
"masscheck report _for a specific rule_" above, I should have.
>> I'm not uploading logs, I'm uploading the message corpora for centralized
>> masschecks.
>
> Are you sure? Are you uploading other than the logs?
I am absolutely NOT uploading logs. I am uploading mbox-format corpora
files for the nightly central masscheck system. I only run masschecks
locally when I am getting ready to check in a non-trivial rule change, and
I don't keep the results around for very long.
> I show masscheck logs like these because you aren't actually uploading the
> emails (which is correct, I believe):
>
> -rw-r--r-- 1 rsync rsync 391675 Aug 8 09:15 ham-bb-jhardin.log
> -rw-r--r-- 1 rsync rsync 391679 Aug 7 09:16 ham-bb-jhardin.log~
> -rw-r--r-- 1 rsync rsync 1145 Aug 8 09:17
> ham-bb-jhardin_fraud.log
> -rw-r--r-- 1 rsync rsync 1145 Aug 7 09:19
> ham-bb-jhardin_fraud.log~
> -rw-r--r-- 1 rsync rsync 419449 Aug 4 09:07 ham-net-bb-jhardin.log
> -rw-r--r-- 1 rsync rsync 420618 Jul 28 09:06 ham-net-bb-jhardin.log~
> -rw-r--r-- 1 rsync rsync 1220 Aug 4 09:09
> ham-net-bb-jhardin_fraud.log
> -rw-r--r-- 1 rsync rsync 1220 Jul 28 09:08
> ham-net-bb-jhardin_fraud.log~
> -rw-r--r-- 1 rsync root 4639820 Oct 1 2009
> ham-rescore-bb-jhardin.log
> -rw-r--r-- 1 rsync rsync 222982 Aug 8 09:15 spam-bb-jhardin.log
> -rw-r--r-- 1 rsync rsync 226858 Aug 7 09:16 spam-bb-jhardin.log~
> -rw-r--r-- 1 rsync rsync 67181 Aug 8 09:17
> spam-bb-jhardin_fraud.log
> -rw-r--r-- 1 rsync rsync 67181 Aug 7 09:19
> spam-bb-jhardin_fraud.log~
> -rw-r--r-- 1 rsync rsync 226058 Aug 4 09:07 spam-net-bb-jhardin.log
> -rw-r--r-- 1 rsync rsync 232278 Jul 28 09:06
> spam-net-bb-jhardin.log~
> -rw-r--r-- 1 rsync rsync 37934 Aug 4 09:09
> spam-net-bb-jhardin_fraud.log
> -rw-r--r-- 1 rsync rsync 25983 Jul 28 09:08
> spam-net-bb-jhardin_fraud.log~
> -rw-r--r-- 1 rsync root 2491637 Oct 1 2009
> spam-rescore-bb-jhardin.log
I assume those are the logs resulting from the central nightly masscheck
that's being run on one of the zones servers.
One thing I am not doing is downloading my result logs and counting the
messages in them as a check against what I'm uploading. I may start doing
that...
>> > Can you remind me of the issue so I can respond intelligently?
>>
>> When I run masschecks locally against an up-to-date repo, it is not
>> setting the message boundary RE properly end gets scads of uninitialized
>> variable errors trying to parse the corpus mailbox files. Last I looked, I
>> added some warn() output and it was setting the default RE properly but
>> then appeared to be resetting it later somewhere.
>
> Sorry about that. I've reopened the bug. I believe I thought that was
> resolved by the conf changes Mark Martinec made so I dropped it.
Thanks.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Usually Microsoft doesn't develop products, we buy products.
-- Arno Edelmann, Microsoft product manager
-----------------------------------------------------------------------
7 days until the 67th anniversary of the end of World War II
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 10:17 AM, John Hardin wrote:
> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>
>> On 8/7/2012 10:14 AM, John Hardin wrote:
>>> On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>>>
>>> > Anyone else seeing missing corpora?
>>> > > Is this possibly a problem where corpora are not being included?
>>>
>>> My uploaded corpora are not _missing_, but the number of messages
>>> reported
>>> for them in the corpora report on the masscheck results pages are far
>>> lower than what is being uploaded. I've started rsync back down to
>>> verify
>>> and it's apparently not a matter of the upload failing. And I do
>>> filter by
>>> date before uploading so it's not a matter of my counting ten thousand
>>> messages from 2002.
>>
>> Can you point me out the masscheck page that you are seeing the
>> difference on?
>
> On any masscheck report, it's listed in two places:
>
> (1) in the "set 0, broken down by contributor" you can hover over the
> hits for spam and ham for every corpus/result set and see the hits and
> total messages used to calculate the percentage
>
> (2) at the bottom if you expand the "Corpus quality" report and see a
> more detailed brakdown of the corpus/results contents
>
> Here are my corpora counts at my end (by the number of '^From\s'):
>
> fraud/spam: 5613
> fraud/ham: 0
> public/spam: 7173
> public/ham: 6069
>
> Here are the numbers from the Corpus Quality report:
>
> bb-jhardin_fraud Spam messages Ham messages
> TOTAL: 17 (0%) 1 (0%)
>
> bb-jhardin Spam messages Ham messages
> TOTAL: 100 (0%) 235 (0%)
>
> I don't know where the single message in the fraud/ham corpus is from,
> I may have uploaded a single dummy and forgetten about it.
>
> You can see the other corpora are either being counted/parsed
> incorrectly or are being filtered somehow.
>
> Strangely enough, the count for the public/spam corpus is different
> between the "set 0" count and the "Corpus quality" report: 67 vs. 100.
Thanks. Can you confirm the exact url you are visiting for this
report. I want to remove all assumptions from the mix.
> I'm not uploading logs, I'm uploading the message corpora for
> centralized masschecks.
Are you sure? Are you uploading other than the logs?
I show masscheck logs like these because you aren't actually uploading
the emails (which is correct, I believe):
-rw-r--r-- 1 rsync rsync 391675 Aug 8 09:15 ham-bb-jhardin.log
-rw-r--r-- 1 rsync rsync 391679 Aug 7 09:16 ham-bb-jhardin.log~
-rw-r--r-- 1 rsync rsync 1145 Aug 8 09:17
ham-bb-jhardin_fraud.log
-rw-r--r-- 1 rsync rsync 1145 Aug 7 09:19
ham-bb-jhardin_fraud.log~
-rw-r--r-- 1 rsync rsync 419449 Aug 4 09:07 ham-net-bb-jhardin.log
-rw-r--r-- 1 rsync rsync 420618 Jul 28 09:06
ham-net-bb-jhardin.log~
-rw-r--r-- 1 rsync rsync 1220 Aug 4 09:09
ham-net-bb-jhardin_fraud.log
-rw-r--r-- 1 rsync rsync 1220 Jul 28 09:08
ham-net-bb-jhardin_fraud.log~
-rw-r--r-- 1 rsync root 4639820 Oct 1 2009
ham-rescore-bb-jhardin.log
-rw-r--r-- 1 rsync rsync 222982 Aug 8 09:15 spam-bb-jhardin.log
-rw-r--r-- 1 rsync rsync 226858 Aug 7 09:16 spam-bb-jhardin.log~
-rw-r--r-- 1 rsync rsync 67181 Aug 8 09:17
spam-bb-jhardin_fraud.log
-rw-r--r-- 1 rsync rsync 67181 Aug 7 09:19
spam-bb-jhardin_fraud.log~
-rw-r--r-- 1 rsync rsync 226058 Aug 4 09:07
spam-net-bb-jhardin.log
-rw-r--r-- 1 rsync rsync 232278 Jul 28 09:06
spam-net-bb-jhardin.log~
-rw-r--r-- 1 rsync rsync 37934 Aug 4 09:09
spam-net-bb-jhardin_fraud.log
-rw-r--r-- 1 rsync rsync 25983 Jul 28 09:08
spam-net-bb-jhardin_fraud.log~
-rw-r--r-- 1 rsync root 2491637 Oct 1 2009
spam-rescore-bb-jhardin.log
>
>> Can you remind me of the issue so I can respond intelligently?
>
> When I run masschecks locally against an up-to-date repo, it is not
> setting the message boundary RE properly end gets scads of
> uninitialized variable errors trying to parse the corpus mailbox
> files. Last I looked, I added some warn() output and it was setting
> the default RE properly but then appeared to be resetting it later
> somewhere.
>
Sorry about that. I've reopened the bug. I believe I thought that was
resolved by the conf changes Mark Martinec made so I dropped it.
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
> On 8/7/2012 10:14 AM, John Hardin wrote:
>> On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>>
>> > Anyone else seeing missing corpora?
>> >
>> > Is this possibly a problem where corpora are not being included?
>>
>> My uploaded corpora are not _missing_, but the number of messages reported
>> for them in the corpora report on the masscheck results pages are far
>> lower than what is being uploaded. I've started rsync back down to verify
>> and it's apparently not a matter of the upload failing. And I do filter by
>> date before uploading so it's not a matter of my counting ten thousand
>> messages from 2002.
>
> Can you point me out the masscheck page that you are seeing the difference
> on?
On any masscheck report, it's listed in two places:
(1) in the "set 0, broken down by contributor" you can hover over the hits
for spam and ham for every corpus/result set and see the hits and total
messages used to calculate the percentage
(2) at the bottom if you expand the "Corpus quality" report and see a more
detailed brakdown of the corpus/results contents
Here are my corpora counts at my end (by the number of '^From\s'):
fraud/spam: 5613
fraud/ham: 0
public/spam: 7173
public/ham: 6069
Here are the numbers from the Corpus Quality report:
bb-jhardin_fraud Spam messages Ham messages
TOTAL: 17 (0%) 1 (0%)
bb-jhardin Spam messages Ham messages
TOTAL: 100 (0%) 235 (0%)
I don't know where the single message in the fraud/ham corpus is from, I
may have uploaded a single dummy and forgetten about it.
You can see the other corpora are either being counted/parsed incorrectly
or are being filtered somehow.
Strangely enough, the count for the public/spam corpus is different
between the "set 0" count and the "Corpus quality" report: 67 vs. 100.
>> How are the messages being counted?
>
> I'm trying to figure that out.
>
>> Might this be related somehow to the message boundary RE config issue I
>> reported to you privately a few months back?
>
> I can't see how since you aren't uploading messages just logs.
I'm not uploading logs, I'm uploading the message corpora for centralized
masschecks.
> Can you remind me of the issue so I can respond intelligently?
When I run masschecks locally against an up-to-date repo, it is not
setting the message boundary RE properly end gets scads of uninitialized
variable errors trying to parse the corpus mailbox files. Last I looked, I
added some warn() output and it was setting the default RE properly but
then appeared to be resetting it later somewhere.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
So Microsoft's invented the ASCII equivalent to ugly ink spots that
appear on your letter when your pen is malfunctioning.
-- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
7 days until the 67th anniversary of the end of World War II
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 10:14 AM, John Hardin wrote:
> On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
>
>> Anyone else seeing missing corpora?
>>
>> Is this possibly a problem where corpora are not being included?
>
> My uploaded corpora are not _missing_, but the number of messages
> reported for them in the corpora report on the masscheck results pages
> are far lower than what is being uploaded. I've started rsync back
> down to verify and it's apparently not a matter of the upload failing.
> And I do filter by date before uploading so it's not a matter of my
> counting ten thousand messages from 2002.
Can you point me out the masscheck page that you are seeing the
difference on?
>
> How are the messages being counted?
I'm trying to figure that out.
>
> Might this be related somehow to the message boundary RE config issue
> I reported to you privately a few months back?
I can't see how since you aren't uploading messages just logs. Can you
remind me of the issue so I can respond intelligently?
Re: Script found that is aborting from insufficient ham
Posted by John Hardin <jh...@impsec.org>.
On Tue, 7 Aug 2012, Kevin A. McGrail wrote:
> Anyone else seeing missing corpora?
>
> Is this possibly a problem where corpora are not being included?
My uploaded corpora are not _missing_, but the number of messages reported
for them in the corpora report on the masscheck results pages are far
lower than what is being uploaded. I've started rsync back down to verify
and it's apparently not a matter of the upload failing. And I do filter by
date before uploading so it's not a matter of my counting ten thousand
messages from 2002.
How are the messages being counted?
Might this be related somehow to the message boundary RE config issue I
reported to you privately a few months back?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Trusting in anti-gun laws to keep you from being shot is like
refusing to wear your seatbelt because you trust traffic laws to
keep you from being in a car accident. -- Erin Palette
-----------------------------------------------------------------------
8 days until the 67th anniversary of the end of World War II
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
07.08.2012 16:37, Kevin A. McGrail kirjoitti:
> Jari, are you seeing your corpus?
I don't quite understand your question. Yes I can see it locally,
jarif@whirlwind:~/masscheckwork/nightly_mass_check/masses$ wc -l
*-jarif.log
11827 ham-jarif.log
1611 spam-jarif.log
13438 yhteensä
But not in ruleqa.spamassassin.org.
--
The man who sets out to carry a cat by its tail learns something that
will always be useful and which never will grow dim or doubtful.
-- Mark Twain
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/10/2012 10:23 AM, Kevin Golding wrote:
> Aha, yes - I'd be using 1.7.5 I believe.
Thanks Kris, great catch. I've updated the bug to acknowledge the issue.
regards,
kAM
Re: Script found that is aborting from insufficient ham
Posted by Kevin Golding <kp...@caomhin.org>.
On 10 Aug 2012, at 15:21, Kris Deugau wrote:
> Kevin Golding wrote:
>> Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them. It looks like my problem is line 998:
>>
>> if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {
>>
>> At that point $dir = /usr/home/masscheck/trunk/masses
>>
>> I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn
>
> Have you upgraded to SVN 1.7? The working copy structure uses only one
> .svn directory at the root of the working copy for 1.7, so if you've
> upgraded from < 1.7, your working copy at /usr/home/masscheck/trunk will
> not have a /usr/home/masscheck/trunk/masses/.svn directory any more.
Aha, yes - I'd be using 1.7.5 I believe.
Re: Script found that is aborting from insufficient ham
Posted by Kris Deugau <kd...@vianet.ca>.
Kevin Golding wrote:
> Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them. It looks like my problem is line 998:
>
> if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {
>
> At that point $dir = /usr/home/masscheck/trunk/masses
>
> I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn
Have you upgraded to SVN 1.7? The working copy structure uses only one
.svn directory at the root of the working copy for 1.7, so if you've
upgraded from < 1.7, your working copy at /usr/home/masscheck/trunk will
not have a /usr/home/masscheck/trunk/masses/.svn directory any more.
-kgd
Re: Script found that is aborting from insufficient ham
Posted by Kevin Golding <kp...@caomhin.org>.
On 9 Aug 2012, at 21:37, Kevin Golding wrote:
> On 9 Aug 2012, at 16:26, Kevin A. McGrail wrote:
>
>> Can you add a few debug statements and see which scenario in get_current_svn_revsion is being used?
>
> Well I put print statements in every conditional in that sub and nothing came out. So either I did something very wrong or it's weirder than I thought.
Okay, that looks depressingly simple I guess... I added in extra statements outside the conditionals and it proved I'm not successfully entering any of them. It looks like my problem is line 998:
if (-d "$dir/.svn" || -f "$dir/svninfo.tmp") {
At that point $dir = /usr/home/masscheck/trunk/masses
I have a /usr/home/masscheck/trunk/.svn but no /usr/home/masscheck/trunk/masses/.svn and no sign of any svninfo.tmp anywhere. Weirdly it looks as if the script creates svninfo.tmp inside the test to see if it exists, which means it will logically never get created as far as I can see.
Just an early morning observation.
I may try nuking the svn repo and rebuilding it in a bit.
Kevin
Re: Script found that is aborting from insufficient ham
Posted by Kevin Golding <kp...@caomhin.org>.
On 9 Aug 2012, at 16:26, Kevin A. McGrail wrote:
> On 8/9/2012 7:25 AM, postmaster@caomhin.org wrote:
>>> On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
>>> Kevin, I can see your submitting logs with SVN Revision: unknown That's
>>> why you are missing, I think.
>> Okay, I think I've hurt my brain looking at this and I'll be honest, a bit
>> lost.
> Are you using the masscheck script that rsyncs the code or the one that svn downloads the code?
svn, it's the trunk/masses/rule-qa/corpus-nightly script
> Can you add a few debug statements and see which scenario in get_current_svn_revsion is being used?
Well I put print statements in every conditional in that sub and nothing came out. So either I did something very wrong or it's weirder than I thought.
> My theory is you are running the svn command line and you might have a broken version of svn. You might need to compile your own.
I'm on FreeBSD - I compiled it from ports a while back. I shall reinstall for another try in the morning, but running svn info by myself does return the expected data.
Kevin
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 7:25 AM, postmaster@caomhin.org wrote:
>> On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
>> Kevin, I can see your submitting logs with SVN Revision: unknown That's
>> why you are missing, I think.
> Okay, I think I've hurt my brain looking at this and I'll be honest, a bit
> lost.
Are you using the masscheck script that rsyncs the code or the one that
svn downloads the code?
Can you add a few debug statements and see which scenario in
get_current_svn_revsion is being used?
My theory is you are running the svn command line and you might have a
broken version of svn. You might need to compile your own.
Anyone know if it would it break things to add information to the
get_current_svn_revision on stdout?
I think we should add code that if revision is unknown, we abort a
masscheck. It's not going to get used...
I'll add this to a bugzilla ticket as well.
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 5:23 PM, Jari Fredriksson wrote:
> My corpus came online today. Strange, if you have not done something.
> Or does it use my installed SA ? I had problems with that, but
> re-built SA from cpan... Maybe that's the key... I was in the
> impression that it downloads/rsyncs from development version of SA and
> does it's job with it, but maybe I was wrong.
I'm unsure what is wrong but I can see that there is something revolving
around the SVN revision.
Kevin, I can see your submitting logs with SVN Revision: unknown That's
why you are missing, I think.
Digging into this.
KAM
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 23:58, Kevin A. McGrail kirjoitti:
> On 8/7/2012 11:58 AM, Axb wrote:
>> -1 but - it spit out even more high scores.
>>
>> > Or I tweak the days Ham is good for to include more ham?
>>
>> +1 This would probably cause less havoc.
> Upped from 72 to 84 with very little change
>>
>> > Jari, are you seeing your corpus?
>> >
>> > Anyone else seeing missing corpora?
>> > Is this possibly a problem where corpora are not being included?
> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass
> check log contains:
>
> # SVN revision: 1369288
>
> It uses this command:
>
> REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN
> revision" | cut -d" " -f4 | sort -rn | head -1`
>
> Short circuiting the script and looking at the corpus set manually, we
> have:
>
> head corpus/usable-corpus-set1/* | grep "SVN revision" | cut -d" "
> -f4 | sort -nr | uniq
> 1369288
> 1366614
> 1364050
> 1348336
> 1342864
> 1307740
> 1301890
> 1296586
> 1243018
> 1240465
> 1197902
> 1183598
> 1128571
> 1098073
> 1064983
> 1059274
> 1042117
> 831520
> 814117
> unknown
>
> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM
My corpus came online today. Strange, if you have not done something.
Or does it use my installed SA ? I had problems with that, but re-built
SA from cpan... Maybe that's the key...
I was in the impression that it downloads/rsyncs from development
version of SA and does it's job with it, but maybe I was wrong.
--
You will visit the Dung Pits of Glive soon.
Re: Script found that is aborting from insufficient ham
Posted by Axb <ax...@gmail.com>.
On 08/08/2012 11:51 PM, Jari Fredriksson wrote:
> 09.08.2012 00:45, Kevin A. McGrail kirjoitti:
>> On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
>>>>>>
>>> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30
>>> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date:
>>> 20120808T135530Z # Perl version: 5.014002 on
>>> x86_64-linux-gnu-thread-multi This is from my last log, generated
>>> late today/yesterday. Maybe I have had something wrong.
>>
>> No, 1370707 looks good from other people. Don't second-guess yourself
>> yet!
>
> Yes, but
>
>> Jari, checking your logs, we have:
>>
>> ham-net-jarif.log:# SVN revision: 1364050
>> spam-net-jarif.log:# SVN revision: 1364050
>>
>> So your logs are being ignored because they are not being run with the
>> "right" version of code.
>>
>> I'm working to find out why you have the wrong version of the code...
>>
>> Regards,
>> KAM
>
> 1364050 does not look good?
KAM needs chocolate:
we're all "N'Sync"
http://ruleqa.spamassassin.org/20120808-r1370707-n
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
1364050 does not look good?
From what I can tell for the time period you were uploading, it was not
the right version, no.
Your newer uploads look better though and I think we are back on track.
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
09.08.2012 00:45, Kevin A. McGrail kirjoitti:
> On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
>>> >>
>> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30
>> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date:
>> 20120808T135530Z # Perl version: 5.014002 on
>> x86_64-linux-gnu-thread-multi This is from my last log, generated
>> late today/yesterday. Maybe I have had something wrong.
>
> No, 1370707 looks good from other people. Don't second-guess yourself
> yet!
Yes, but
> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM
1364050 does not look good?
--
You will be reincarnated as a toad; and you will be much happier.
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 5:38 PM, Jari Fredriksson wrote:
> # mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30
> +0000 # M:SA version 3.4.0-r1197259 # SVN revision: 1370707 # Date:
> 20120808T135530Z # Perl version: 5.014002 on
> x86_64-linux-gnu-thread-multi This is from my last log, generated late
> today/yesterday. Maybe I have had something wrong.
No, 1370707 looks good from other people. Don't second-guess yourself yet!
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 23:58, Kevin A. McGrail kirjoitti:
> On 8/7/2012 11:58 AM, Axb wrote:
>> -1 but - it spit out even more high scores.
>>
>> > Or I tweak the days Ham is good for to include more ham?
>>
>> +1 This would probably cause less havoc.
> Upped from 72 to 84 with very little change
>>
>> > Jari, are you seeing your corpus?
>> >
>> > Anyone else seeing missing corpora?
>> > Is this possibly a problem where corpora are not being included?
> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass
> check log contains:
>
> # SVN revision: 1369288
>
> It uses this command:
>
> REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN
> revision" | cut -d" " -f4 | sort -rn | head -1`
>
> Short circuiting the script and looking at the corpus set manually, we
> have:
>
> head corpus/usable-corpus-set1/* | grep "SVN revision" | cut -d" "
> -f4 | sort -nr | uniq
> 1369288
> 1366614
> 1364050
> 1348336
> 1342864
> 1307740
> 1301890
> 1296586
> 1243018
> 1240465
> 1197902
> 1183598
> 1128571
> 1098073
> 1064983
> 1059274
> 1042117
> 831520
> 814117
> unknown
>
> Jari, checking your logs, we have:
>
> ham-net-jarif.log:# SVN revision: 1364050
> spam-net-jarif.log:# SVN revision: 1364050
>
> So your logs are being ignored because they are not being run with the
> "right" version of code.
>
> I'm working to find out why you have the wrong version of the code...
>
> Regards,
> KAM
# mass-check results from jarif@whirlwind, on ke 8.8.2012 13.55.30 +0000
# M:SA version 3.4.0-r1197259
# SVN revision: 1370707
# Date: 20120808T135530Z
# Perl version: 5.014002 on x86_64-linux-gnu-thread-multi
This is from my last log, generated late today/yesterday.
Maybe I have had something wrong.
--
AWAKE! FEAR! FIRE! FOES! AWAKE!
FEAR! FIRE! FOES!
AWAKE! AWAKE!
-- J. R. R. Tolkien
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/8/2012 6:24 PM, Jari Fredriksson wrote:
> 09.08.2012 00:52, John Hardin kirjoitti:
>> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>>
>>> OK, so here's the issue:
>>>
>>> The script running RIGHT now is checking to see if the uploaded mass
>>> check log contains:
>>>
>>> # SVN revision: 1369288
>> N.B.: It's probably a very good idea to always do a SVN UP and make
>> before running a local masscheck for upload. Jari, are you doing that?
>>
> No, I'm just running the supplied script.
>
> 00 12 * * * test -x bin/auto-mass-check.sh/usr/bin/nice && nice
> bin/auto-mass-check.sh
>
> I have sligtly edited it, but it should not matter, as it is not in the
> automasscheck folder.
>
> The script does not run svn in any way, all it does is rsync stuff from
> apache site.
rsync or svn should be fine. As long as you have the later code. I
think you are on track to be fine and we'll know more after this weekend.
I don't know why but there is a run of mass-check based on the weekend
and the weekday code. We won't generate new updates until after a
weekend check hits the minimums.
I think you had issues, though I think you worked them out already.
I added quite a bit of debug code to the generate-new-rules script and
it's working so I can tell a lot more about what's going on, who failed
and why.
I should also be able to pinpoint who isn't working correctly and a bit
more about why!
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
09.08.2012 00:52, John Hardin kirjoitti:
> On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
>
>> OK, so here's the issue:
>>
>> The script running RIGHT now is checking to see if the uploaded mass
>> check log contains:
>>
>> # SVN revision: 1369288
>
> N.B.: It's probably a very good idea to always do a SVN UP and make
> before running a local masscheck for upload. Jari, are you doing that?
>
No, I'm just running the supplied script.
00 12 * * * test -x bin/auto-mass-check.sh/usr/bin/nice && nice
bin/auto-mass-check.sh
I have sligtly edited it, but it should not matter, as it is not in the
automasscheck folder.
The script does not run svn in any way, all it does is rsync stuff from
apache site.
> If the revision number is from a SA file/directory, then adding the
> SVN UP before masscheck might well solve the missing corpora problem
> that Jari et. al. are seeing.
>
> Kevin, do you know yet where the scripting is getting the target SVN
> revision from?
>
--
You are so boring that when I see you my feet go to sleep.
Re: Script found that is aborting from insufficient ham
Posted by John Hardin <jh...@impsec.org>.
On Wed, 8 Aug 2012, Kevin A. McGrail wrote:
> OK, so here's the issue:
>
> The script running RIGHT now is checking to see if the uploaded mass
> check log contains:
>
> # SVN revision: 1369288
N.B.: It's probably a very good idea to always do a SVN UP and make before
running a local masscheck for upload. Jari, are you doing that?
If the revision number is from a SA file/directory, then adding the SVN UP
before masscheck might well solve the missing corpora problem that Jari
et. al. are seeing.
Kevin, do you know yet where the scripting is getting the target SVN
revision from?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
News flash: Lowest Common Denominator down 50 points
-----------------------------------------------------------------------
7 days until the 67th anniversary of the end of World War II
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 11:58 AM, Axb wrote:
> -1 but - it spit out even more high scores.
>
> > Or I tweak the days Ham is good for to include more ham?
>
> +1 This would probably cause less havoc.
Upped from 72 to 84 with very little change
>
> > Jari, are you seeing your corpus?
> >
> > Anyone else seeing missing corpora?
> > Is this possibly a problem where corpora are not being included?
OK, so here's the issue:
The script running RIGHT now is checking to see if the uploaded mass
check log contains:
# SVN revision: 1369288
It uses this command:
REVISION=`head corpus/usable-corpus-set${SCORESET}/* | grep "SVN
revision" | cut -d" " -f4 | sort -rn | head -1`
Short circuiting the script and looking at the corpus set manually, we have:
head corpus/usable-corpus-set1/* | grep "SVN revision" | cut -d" " -f4
| sort -nr | uniq
1369288
1366614
1364050
1348336
1342864
1307740
1301890
1296586
1243018
1240465
1197902
1183598
1128571
1098073
1064983
1059274
1042117
831520
814117
unknown
Jari, checking your logs, we have:
ham-net-jarif.log:# SVN revision: 1364050
spam-net-jarif.log:# SVN revision: 1364050
So your logs are being ignored because they are not being run with the
"right" version of code.
I'm working to find out why you have the wrong version of the code...
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
08.08.2012 00:30, Kevin A. McGrail kirjoitti:
>
> On 8/7/2012 12:22 PM, Jari Fredriksson wrote:
>> My typical log here (even increaset the rsync verbosity to be sure)
>
> I see your logs but one oddity I noticed is a lack of -net logs. Looks
> like that might have stopped on Jul 21st?
Yes, I have had some system failures on weekends.
>
> -rw-r--r-- 1 rsync rsync 17038423 Aug 7 09:43 ham-jarif.log
> -rw-r--r-- 1 rsync rsync 17916771 Jul 21 11:45 ham-net-jarif.log
> -rw-r--r-- 1 rsync rsync 2746479 Aug 7 09:43 spam-jarif.log
> -rw-r--r-- 1 rsync rsync 3119937 Jul 21 11:45 spam-net-jarif.log
>
>
> Regards,
> KAM
>
--
"Well - I don't think anyone would succeed to publish a web sight"
Husse Dec 2 2007
Re: Script found that is aborting from insufficient ham
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/7/2012 12:22 PM, Jari Fredriksson wrote:
> My typical log here (even increaset the rsync verbosity to be sure)
I see your logs but one oddity I noticed is a lack of -net logs. Looks
like that might have stopped on Jul 21st?
-rw-r--r-- 1 rsync rsync 17038423 Aug 7 09:43 ham-jarif.log
-rw-r--r-- 1 rsync rsync 17916771 Jul 21 11:45 ham-net-jarif.log
-rw-r--r-- 1 rsync rsync 2746479 Aug 7 09:43 spam-jarif.log
-rw-r--r-- 1 rsync rsync 3119937 Jul 21 11:45 spam-net-jarif.log
Regards,
KAM
Re: Script found that is aborting from insufficient ham
Posted by Jari Fredriksson <ja...@iki.fi>.
07.08.2012 18:58, Axb kirjoitti:
> > Jari, are you seeing your corpus?
> >
> > Anyone else seeing missing corpora?
> > Is this possibly a problem where corpora are not being included?
>
> I'm watching my masscheck logs closely - all there.
My typical log here (even increaset the rsync verbosity to be sure)
Removing duplicates from HAM SPAM ... done.
Removing unwanted HAM mail from corpus
0
Removing Maildir/.Confirmed-HAM/cur/1344318666.M953201P4886V0000000000000806I0000000000B6331C_0.whirlwind,S=14993:2,S ... done
Removing unwanted SPAM mail from corpus
Syncing nightly_mass_check
+ ./mass-check --hamlog=ham-jarif.log --spamlog=spam-jarif.log -j 4 --progress --reuse ham:dir:/home/jarif/Maildir/.Confirmed-HAM spam:dir:/home/jarif/Maildir/.Confirmed-SPAM
status: starting scan stage now: 2012-08-07 12.01.56
status: completed scan stage, 13485 messages now: 2012-08-07 12.01.58
status: starting run stage now: 2012-08-07 12.01.58
status: 10% ham: 1185 spam: 164 date: 2012-05-09 now: 2012-08-07 12.04.17
status: 20% ham: 2373 spam: 325 date: 2011-04-22 now: 2012-08-07 12.08.12
status: 30% ham: 3560 spam: 487 date: 2011-06-17 now: 2012-08-07 12.12.36
status: 40% ham: 4748 spam: 648 date: 2011-09-05 now: 2012-08-07 12.16.53
status: 50% ham: 5937 spam: 808 date: 2011-11-01 now: 2012-08-07 12.21.11
status: 60% ham: 7125 spam: 969 date: 2012-01-02 now: 2012-08-07 12.25.46
status: 70% ham: 8313 spam: 1130 date: 2012-03-06 now: 2012-08-07 12.30.11
status: 80% ham: 9501 spam: 1291 date: 2012-05-04 now: 2012-08-07 12.34.39
status: 90% ham: 10689 spam: 1452 date: 2012-07-24 now: 2012-08-07 12.39.01
status: completed run stage now: 2012-08-07 12.43.01
+ LOGLIST=' ham-jarif.log spam-jarif.log'
+ set +x
rsync -Pcvz ham-jarif.log spam-jarif.log jarif@rsync.spamassassin.org::corpus/
This is the SpamAssassin Corpus rsync machine.
Modules that are available:
corpus
nightly mass-check result upload area. It is password protected.
If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "nightly" username and password.
submit
Score generation mass-check result upload area. It is password
protected. If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "score generation" username
and password. Generally these are only granted after a mass-check
announcement has been made on the spamassassin developer mailing list.
anoncorpus
mass-check result download area, available via anonymous access.
ham-jarif.log
32768 0% 0.00kB/s 0:00:00
2658417 15% 2.19MB/s 0:00:06
5541500 32% 2.44MB/s 0:00:04
7794435 45% 2.30MB/s 0:00:03
9597411 56% 2.16MB/s 0:00:03
11696723 68% 2.08MB/s 0:00:02
13594276 79% 1.85MB/s 0:00:01
15267861 89% 1.75MB/s 0:00:00
17038423 100% 1.96MB/s 0:00:08 (xfer#1, to-check=1/2)
spam-jarif.log
8280 0% 8.09kB/s 0:05:38
11592 0% 11.32kB/s 0:04:01
2746479 100% 1.73MB/s 0:00:01 (xfer#2, to-check=0/2)
sent 1021990 bytes received 34822 bytes 72883.59 bytes/sec
total size is 19784902 speedup is 18.72
Re: Script found that is aborting from insufficient ham
Posted by Axb <ax...@gmail.com>.
On 08/07/2012 03:37 PM, Kevin A. McGrail wrote:> I found the script
which runs under updatesd via cron on zones 1
>
> HAM: 135596 (150000 required)
> SPAM: 268096 (150000 required)
>
> Insufficient ham corpus to generate scores; aborting.
> Exit Status 8 is not zero for do-nightly-resorce-example
>
> Do we want to lower the limit for ham perhaps to 135?
-1 but - it spit out even more high scores.
> Or I tweak the days Ham is good for to include more ham?
+1 This would probably cause less havoc.
> Jari, are you seeing your corpus?
>
> Anyone else seeing missing corpora?
> Is this possibly a problem where corpora are not being included?
I'm watching my masscheck logs closely - all there.
Axb