You are viewing a plain text version of this content. The canonical link for it is here.
Posted to sysadmins@spamassassin.apache.org by Merijn van den Kroonenberg <me...@web2all.nl> on 2017/11/10 15:01:41 UTC

ruleqa user llanga

>
>>> Day 2 doesn't have that table with "mcviewing".  The next question is
>>> what is causing this problem.  Is it related to new commits that throw
>>> off the masscheck processing?
>>
>> The 2 days ago doesn't highlight a current masscheck....but still it
>> shows
>> a result at the bottom...so its showing *something*. I think its likely
>> it
>> is the masxcheck as present in the datrev input field:
>> 20171108-r1814560-n
>> But that one isn't in any daterev liting, not even in the full listing.
>>
>> So i think something in the ruleqa.cgi which builds the daterev list is
>> broken and leaves out some masschecks.
>> If I get the cachefile and the ddirectory listings I can go debug where
>> things go pear-shaped.
>>
>
> I have found one dubious piece of code where the masschecks are indexed
> based on their svn rev number. But that is not an unique value has the
> same revision  can be masschecked multiple times (by different
> submitter/date).

I think this is in fact the case.
There is something weird with masscheck user llanga.
Either something is off with the timing of masscheck result submission or
that user submits the masscheck result twice (once more the next day for
the same revision).
I think thats what triggers the bug in the ruleqa page.

ls -l html/20171108/r1814560-n/LOGS.all-*-llanga*
5356811 Nov 10 01:05 html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
521798 Nov 10 01:06 html/20171108/r1814560-n/LOGS.all-spam-llanga.log.gz

ls -l html/20171109/r1814560-n/LOGS.all-*-llanga*
5356811 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz
521798 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-spam-llanga.log.gz

b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129 
html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129 
html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz

same checksum so same files.
The question is, does the user do something wrong or is some scripting
messed up (maybe related to bad timing or timezone issues).

>
> Please see attached patch for masses/rulequa/ruleqa.cgi

I think i failed to attach patch correctly but send it directly to dave.

>
> If this is not it then I suspect code around line 453 which trims some
> revisions away. But its very hard to read code.



Re: ruleqa user llanga

Posted by Dave Jones <da...@apache.org>.
On 11/10/2017 09:01 AM, Merijn van den Kroonenberg wrote:
>>
>>>> Day 2 doesn't have that table with "mcviewing".  The next question is
>>>> what is causing this problem.  Is it related to new commits that throw
>>>> off the masscheck processing?
>>>
>>> The 2 days ago doesn't highlight a current masscheck....but still it
>>> shows
>>> a result at the bottom...so its showing *something*. I think its likely
>>> it
>>> is the masxcheck as present in the datrev input field:
>>> 20171108-r1814560-n
>>> But that one isn't in any daterev liting, not even in the full listing.
>>>
>>> So i think something in the ruleqa.cgi which builds the daterev list is
>>> broken and leaves out some masschecks.
>>> If I get the cachefile and the ddirectory listings I can go debug where
>>> things go pear-shaped.
>>>
>>
>> I have found one dubious piece of code where the masschecks are indexed
>> based on their svn rev number. But that is not an unique value has the
>> same revision  can be masschecked multiple times (by different
>> submitter/date).
> 
> I think this is in fact the case.
> There is something weird with masscheck user llanga.
> Either something is off with the timing of masscheck result submission or
> that user submits the masscheck result twice (once more the next day for
> the same revision).
> I think thats what triggers the bug in the ruleqa page.
> 
> ls -l html/20171108/r1814560-n/LOGS.all-*-llanga*
> 5356811 Nov 10 01:05 html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
> 521798 Nov 10 01:06 html/20171108/r1814560-n/LOGS.all-spam-llanga.log.gz
> 
> ls -l html/20171109/r1814560-n/LOGS.all-*-llanga*
> 5356811 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz
> 521798 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-spam-llanga.log.gz
> 
> b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129
> html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
> b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129
> html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz
> 
> same checksum so same files.
> The question is, does the user do something wrong or is some scripting
> messed up (maybe related to bad timing or timezone issues).
> 
>>
>> Please see attached patch for masses/rulequa/ruleqa.cgi
> 
> I think i failed to attach patch correctly but send it directly to dave.
> 
>>
>> If this is not it then I suspect code around line 453 which trims some
>> revisions away. But its very hard to read code.
> 
> 

I think I figured out what was causing problems with the masscheck SVN 
revision getting thrown off by commits and llanga.  I was determining 
the $REVISION in masses/rule-update-score-gen/generate-new-scores.sh 
around line 123 by finding the newest SVN revision.  I thought the 
staging of the rsync dir and the SVN tagged versions would keep that SVN 
revision locked in for a 24 hour period.  Now I have updated the logic 
to find the SVN revision with the most occurrences in all of the corpus 
for that particular scoreset type.

It might work best if a tag file was dropped with the SVN revision by 
the run_nightly scrip that stages the masscheck area so the 
generate-new-scores.sh could be better matched to that SVN revision.  If 
an SVN command could be used to find the latest sa-update tagged 
version, then that could be used instead of a tag file.

--
Dave

Re: ruleqa user llanga

Posted by Dave Jones <da...@apache.org>.
On 11/10/2017 09:01 AM, Merijn van den Kroonenberg wrote:
>>
>>>> Day 2 doesn't have that table with "mcviewing".  The next question is
>>>> what is causing this problem.  Is it related to new commits that throw
>>>> off the masscheck processing?
>>>
>>> The 2 days ago doesn't highlight a current masscheck....but still it
>>> shows
>>> a result at the bottom...so its showing *something*. I think its likely
>>> it
>>> is the masxcheck as present in the datrev input field:
>>> 20171108-r1814560-n
>>> But that one isn't in any daterev liting, not even in the full listing.
>>>
>>> So i think something in the ruleqa.cgi which builds the daterev list is
>>> broken and leaves out some masschecks.
>>> If I get the cachefile and the ddirectory listings I can go debug where
>>> things go pear-shaped.
>>>
>>
>> I have found one dubious piece of code where the masschecks are indexed
>> based on their svn rev number. But that is not an unique value has the
>> same revision  can be masschecked multiple times (by different
>> submitter/date).
> 
> I think this is in fact the case.
> There is something weird with masscheck user llanga.
> Either something is off with the timing of masscheck result submission or
> that user submits the masscheck result twice (once more the next day for
> the same revision).
> I think thats what triggers the bug in the ruleqa page.
> 
> ls -l html/20171108/r1814560-n/LOGS.all-*-llanga*
> 5356811 Nov 10 01:05 html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
> 521798 Nov 10 01:06 html/20171108/r1814560-n/LOGS.all-spam-llanga.log.gz
> 
> ls -l html/20171109/r1814560-n/LOGS.all-*-llanga*
> 5356811 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz
> 521798 Nov 10 08:12 html/20171109/r1814560-n/LOGS.all-spam-llanga.log.gz
> 
> b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129
> html/20171108/r1814560-n/LOGS.all-ham-llanga.log.gz
> b14039f7b3ef3329d6bbd80e8a2eb5e04eb62129
> html/20171109/r1814560-n/LOGS.all-ham-llanga.log.gz
> 
> same checksum so same files.
> The question is, does the user do something wrong or is some scripting
> messed up (maybe related to bad timing or timezone issues).
> 

I will look at this closer this evening in 4 or 5 hours.  I do see that 
this masschecker llanga is standing out on the 
http://ruleqa.spamassassin.org pages:

http://ruleqa.spamassassin.org/1-days-ago?xml=1
http://ruleqa.spamassassin.org/2-days-ago?xml=1
http://ruleqa.spamassassin.org/3-days-ago?xml=1

The masscheck processing is supposed to filter out masscheck submissions 
that don't match the SVN tagged revision plust some other minimum 
requirements but it may not be handling this situation properly.

>>
>> Please see attached patch for masses/rulequa/ruleqa.cgi
> 
> I think i failed to attach patch correctly but send it directly to dave.

I have committed and applied your patch to the working area.

> 
>>
>> If this is not it then I suspect code around line 453 which trims some
>> revisions away. But its very hard to read code.
> 
> 

--
Dave