You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2012/08/08 17:09:20 UTC

[Bug 6821] New: Masscheck is not including logs for everyone who is uploading

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

          Priority: P2
            Bug ID: 6821
          Assignee: dev@spamassassin.apache.org
           Summary: Masscheck is not including logs for everyone who is
                    uploading
          Severity: normal
    Classification: Unclassified
                OS: Windows 7
          Reporter: kmcgrail@pccc.com
          Hardware: PC
            Status: NEW
           Version: SVN Trunk (Latest Devel Version)
         Component: Building & Packaging
           Product: Spamassassin

At a minimum, there appear to be issues with some log files not being included
in the masscheck analysis.  Jari and Kevin Golding, for example.

There does not appear to be an upload problem.

If I look at the server, I see Jari's files and Kevin Golding's.

However, I believe when I see: axb-coi-bulk axb-fraud axb-generic axb-sa-users
axb-woas bb-guenther_fraud bb-jhardin bb-jhardin_fraud bb-jm bernie-fsf
bernie-it_batt bernie-mix danmcdonald darxus grenier on the list of.

That shows that Jari's uploads, for example, are not being used.  Correct? 

I'm trying to see if I have any logs that show Jari vs. Axb upload file.

Ok, going through cron, I see this which is good:

Your "cron" job on spamassassin.zones.apache.org
bash
/export/home/updatesd/svn/mkupdates-with-scores/do-stable-update-with-scores

produced the following output:

Running do-nightly-rescore-example...

At revision 1370633.
Wed Aug  8 02:25:01 GMT 2012
[ rsyncing logs locally ]
building file list ... done
ham-axb-coi-bulk.log
ham-axb-fraud.log
ham-axb-generic.log
ham-axb-sa-users.log
ham-axb-woas.log
ham-bb-guenther_fraud.log
ham-bb-jhardin.log
ham-bb-jhardin_fraud.log
ham-bb-jm.log
ham-bernie-fsf.log
ham-bernie-it_batt.log
ham-bernie-mix.log
ham-danmcdonald.log
ham-darxus.log
ham-grenier.log
ham-jarif.log
ham-kgolding.log
ham-llanga.log
spam-axb-coi-bulk.log
spam-axb-fraud.log
spam-axb-generic.log
spam-axb-sa-users.log
spam-axb-woas.log
spam-bb-guenther_fraud.log
spam-bb-jhardin.log
spam-bb-jhardin_fraud.log
spam-bb-jm.log
spam-bernie-fsf.log
spam-bernie-it_batt.log
spam-bernie-mix.log
spam-danmcdonald.log
spam-darxus.log
spam-grenier.log
spam-jarif.log
spam-kgolding.log
spam-llanga.log

sent 838170324 bytes  received 740 bytes  42983131.49 bytes/sec
total size is 6085599473  speedup is 7.26


but I don't really ever see any more information that let's me deduce what
files are being used. 

I believe 
/export/home/updatesd/svn/mkupdates-with-scores/do-stable-update-with-scores
and all the scripts it calls are in need of some debug statements.

I'm going to start with generate-new-scores.

I'm adding this:

svn diff
Index: generate-new-scores
===================================================================
--- generate-new-scores (revision 1353252)
+++ generate-new-scores (working copy)
@@ -59,24 +59,28 @@
   do
     FILE=`echo $FILE | cut -d"/" -f2-`
     ln corpus/$FILE corpus/usable-corpus-set${SCORESET}/$FILE || exit $?
+    echo "Linked $FILE to corpus/usable-corpus-set${SCORESET}/$FILE"
   done
 elif [ $SCORESET -eq 2 ]; then
   for FILE in `find corpus -type f -name "*am-bayes-*" | grep -v net-`;
   do
     FILE=`echo $FILE | cut -d"/" -f2-`
     ln corpus/$FILE corpus/usable-corpus-set${SCORESET}/$FILE || exit $?
+    echo "Linked $FILE to corpus/usable-corpus-set${SCORESET}/$FILE"
   done
 elif [ $SCORESET -eq 1 ]; then
   for FILE in `find corpus -type f -name "*am-net-*"`;
   do
     FILE=`echo $FILE | cut -d"/" -f2-`
     ln corpus/$FILE corpus/usable-corpus-set${SCORESET}/$FILE || exit $?
+    echo "Linked $FILE to corpus/usable-corpus-set${SCORESET}/$FILE"
   done
 elif [ $SCORESET -eq 0 ]; then
   for FILE in `find corpus -type f -name "*am-*" | grep -v net- | grep -v
bayes-`;
   do
     FILE=`echo $FILE | cut -d"/" -f2-`
     ln corpus/$FILE corpus/usable-corpus-set${SCORESET}/$FILE || exit $?
+    echo "Linked $FILE to corpus/usable-corpus-set${SCORESET}/$FILE"
   done
 else
   echo "Unknown score set: $SCORESET"
@@ -94,7 +98,8 @@

 for FILE in `find corpus/usable-corpus-set$SCORESET -type f`;
 do
-  head $FILE | grep "SVN revision: $REVISION" || rm $FILE
+  echo "Checking $FILE for SVN $REVISION..."
+  head $FILE | grep "SVN revision: $REVISION" || (rm $FILE; echo "$FILE does
not meet the requirements")
 done

 date

Hopefully, this will show more information when cron fires off next.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

--- Comment #3 from Kevin A. McGrail <km...@pccc.com> ---
As noted on the list, we appear to have people with the wrong SVN version or
SVN revisions like unknown.

When this occurs, their logs are ignored.

I also find that generate-new-scores runs for BOTH the net (weekend) and
non-net (daily) runs every day.  So right now, the scripts exit because last
weekends checks did not have enough corpora though we are MUCH closer for the
daily checks.

I've added a bit of debugging and hopefully we'll know a LOT more this weekend.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #4 from Kevin A. McGrail <km...@pccc.com> ---
I found this is a known issue and we need a better plan of attack.  I will add
an idea to bug 6753.

*** This bug has been marked as a duplicate of bug 6753 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

--- Comment #2 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #1)
>  svn commit -m 'adding some debug to  generate-new-scores for bug 6821'
> Sending        rule-update-score-gen/generate-new-scores
> Transmitting file data .
> Committed revision 1370799.

Turns out spamassassin zones1 is using the scripts in
rulesrc/sandbox/dos/new-rule-score-gen/ not here.

I'm not reverting the other one.

svn commit -m 'updating the correct generate-new-scores file'
Sending        generate-new-scores
Transmitting file data .
Committed revision 1370938.

Added a warning though:

svn commit -m 'Added warning that these dir is possibly replaced by
rulesrc/sandbox/dos/new-rule-score-gen/'
Adding         rule-update-score-gen/IMPORTANT
Transmitting file data .
Committed revision 1370940.

Getting closer...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #1 from Kevin A. McGrail <km...@pccc.com> ---
 svn commit -m 'adding some debug to  generate-new-scores for bug 6821'
Sending        rule-update-score-gen/generate-new-scores
Transmitting file data .
Committed revision 1370799.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

--- Comment #7 from Kevin A. McGrail <km...@pccc.com> ---
Please edit bug 6753 for this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

--- Comment #6 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #5)
> Created attachment 5082 [details]
> Temp fix
> 
> The core problem in my case is that get_current_svn_revision() fails
> miserably on my install.  The problem it has is the checks performed on line
> 998 are both FALSE for me.  The included patch checks ${TOPDIR} instead of
> ${TOPDIR}/masses for the .svn directory - which then writes a svninfo.tmp
> file and gives me a revision number. Just confirmed on a masscheck and I now
> have the correct header.
> 
> Not a longterm solution but it certainly shows what's happening for me.

True but this might not get the right revision.  If this doesn't have the SVN
revision as other masscheckers, the results are not used.

However, this is great feedback and puts a fire under our need to improve and
unify the mass check script.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6821] Masscheck is not including logs for everyone who is uploading

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6821

--- Comment #5 from Kevin Golding <ca...@gmail.com> ---
Created attachment 5082
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5082&action=edit
Temp fix

The core problem in my case is that get_current_svn_revision() fails miserably
on my install.  The problem it has is the checks performed on line 998 are both
FALSE for me.  The included patch checks ${TOPDIR} instead of ${TOPDIR}/masses
for the .svn directory - which then writes a svninfo.tmp file and gives me a
revision number. Just confirmed on a masscheck and I now have the correct
header.

Not a longterm solution but it certainly shows what's happening for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.