You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2018/05/17 21:09:10 UTC

giovanni and llanga masschecks?

I notice from the RuleQA website that the masschecks from giovanni and 
llanga are consistently reported separately from everybody else's.

I wonder whether this is affecting the quality of masscheck - is this 
perhaps causing it to bounce back and forth between scores (or do 
something else that's suboptimal) based on what appears to it to be two 
separate and different masscheck corpora?

Is this because of how those masschecks are being run or submitted, or is 
the masscheck infrastructure too strict (filename matching, submission 
cutoffs, etc.)?

It *feels* like those result sets are not coming in by the cutoff.

Unfortunately the ruleQA website doesn't expose a tool to report when the 
results were submitted, just which DateRev the masscheck was based on.

I'd say "perhaps we need to extend the cutoff a bit", but I have no idea 
ATM when those result sets are coming in so I have no idea how the cutoff 
would be adjusted.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The Tea Party wants to remove the Crony from Crony Capitalism.
   OWS wants to remove Capitalism from Crony Capitalism.
                                                     -- Astaghfirullah
-----------------------------------------------------------------------
  413 days since the first commercial re-flight of an orbital booster (SpaceX)

Re: giovanni and llanga masschecks?

Posted by Giovanni Bechis <gi...@paclan.it>.
On Wed, Jul 11, 2018 at 03:45:17PM -0700, John Hardin wrote:
> On Fri, 6 Jul 2018, Tom Hendrikx wrote:
> 
> >
> >
> > On 06-07-18 16:13, Giovanni Bechis wrote:
> >> On 07/03/18 21:38, John Hardin wrote:
> >>> On Tue, 22 May 2018, John Hardin wrote:
> >>>
> >>>> On Tue, 22 May 2018, Giovanni Bechis wrote:
> >>>>
> >>>>> On 05/18/18 00:53, John Hardin wrote:
> >>>>> [...]
> >>>>>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
> >>>>>>
> >>>>>>> SVN tagged rev in nightly_mass_check:  1831759
> >>>>>>>
> >>>>>>> New masscheck submission listings in the past day:
> >>>>>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
> >>>>>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
> >>>>>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
> >>>>>>
> >>>>> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
> >>>>
> >>>> Note the source commit number in the first column. It *looks* like you're submitting your results about 20 hours *after* everyone else. The results from the *prior* masscheck are coming in about 3 hours before everybody else's result for the *current* masscheck.
> >>>
> >>> This appears to still be happening. Can you review when your masschecks are starting/finishing?
> >>>
> >>>
> >> It starts at 8:15 UTC and finishes before 8:30 UTC.
> >> Yesterday's run seems to have worked.
> >>  Giovanni
> >>
> >
> > That is the wrong schedule, as new data is pushed out at 9:00 UTC each
> > day. At 8:30 UTC you're processing 23h old data, which is exactly the
> > problem in this thread. Could you change the schedule to 9:30 UTC?
> 
> Ok, Giovanni's submissions for the past three masschecks look good, I 
> expect that was the missing bit (if you made that change).
> 
I posponed the scheduled masscheck, now it starts around 9:30 UTC.

 Cheers
   Giovanni

> llanga still looks odd. There appear to be *two* submissions, one 
> off-schedule and one on. As I compose this, the "on-schedule" submission 
> for the current masscheck isn't present, but the "on-schedule" one for the 
> previous masscheck is present.
> 
> llanga, could you check your masscheck schedule(s) and runtime(s)?
> 
> 
> 
> -- 
>   John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>   jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>    Back in 1969 the technology to fake a Moon landing didn't exist,
>    but the technology to actually land there did.
>    Today, it is the opposite.                               -- unknown
> -----------------------------------------------------------------------
>   9 days until the 49th anniversary of Apollo 11 landing on the Moon


Re: giovanni and llanga masschecks?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 6 Jul 2018, Tom Hendrikx wrote:

>
>
> On 06-07-18 16:13, Giovanni Bechis wrote:
>> On 07/03/18 21:38, John Hardin wrote:
>>> On Tue, 22 May 2018, John Hardin wrote:
>>>
>>>> On Tue, 22 May 2018, Giovanni Bechis wrote:
>>>>
>>>>> On 05/18/18 00:53, John Hardin wrote:
>>>>> [...]
>>>>>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
>>>>>>
>>>>>>> SVN tagged rev in nightly_mass_check:  1831759
>>>>>>>
>>>>>>> New masscheck submission listings in the past day:
>>>>>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>>>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>>>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
>>>>>>
>>>>> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
>>>>
>>>> Note the source commit number in the first column. It *looks* like you're submitting your results about 20 hours *after* everyone else. The results from the *prior* masscheck are coming in about 3 hours before everybody else's result for the *current* masscheck.
>>>
>>> This appears to still be happening. Can you review when your masschecks are starting/finishing?
>>>
>>>
>> It starts at 8:15 UTC and finishes before 8:30 UTC.
>> Yesterday's run seems to have worked.
>>  Giovanni
>>
>
> That is the wrong schedule, as new data is pushed out at 9:00 UTC each
> day. At 8:30 UTC you're processing 23h old data, which is exactly the
> problem in this thread. Could you change the schedule to 9:30 UTC?

Ok, Giovanni's submissions for the past three masschecks look good, I 
expect that was the missing bit (if you made that change).

llanga still looks odd. There appear to be *two* submissions, one 
off-schedule and one on. As I compose this, the "on-schedule" submission 
for the current masscheck isn't present, but the "on-schedule" one for the 
previous masscheck is present.

llanga, could you check your masscheck schedule(s) and runtime(s)?



-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Back in 1969 the technology to fake a Moon landing didn't exist,
   but the technology to actually land there did.
   Today, it is the opposite.                               -- unknown
-----------------------------------------------------------------------
  9 days until the 49th anniversary of Apollo 11 landing on the Moon

Re: giovanni and llanga masschecks?

Posted by Tom Hendrikx <to...@whyscream.net>.

On 06-07-18 16:13, Giovanni Bechis wrote:
> On 07/03/18 21:38, John Hardin wrote:
>> On Tue, 22 May 2018, John Hardin wrote:
>>
>>> On Tue, 22 May 2018, Giovanni Bechis wrote:
>>>
>>>> On 05/18/18 00:53, John Hardin wrote:
>>>> [...]
>>>>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
>>>>>
>>>>>> SVN tagged rev in nightly_mass_check:  1831759
>>>>>>
>>>>>> New masscheck submission listings in the past day:
>>>>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
>>>>>
>>>> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
>>>
>>> Note the source commit number in the first column. It *looks* like you're submitting your results about 20 hours *after* everyone else. The results from the *prior* masscheck are coming in about 3 hours before everybody else's result for the *current* masscheck.
>>
>> This appears to still be happening. Can you review when your masschecks are starting/finishing?
>>
>>
> It starts at 8:15 UTC and finishes before 8:30 UTC.
> Yesterday's run seems to have worked.
>  Giovanni
> 

That is the wrong schedule, as new data is pushed out at 9:00 UTC each
day. At 8:30 UTC you're processing 23h old data, which is exactly the
problem in this thread. Could you change the schedule to 9:30 UTC?

Kind regards,

	Tom


Re: giovanni and llanga masschecks?

Posted by Giovanni Bechis <gi...@paclan.it>.
On 07/03/18 21:38, John Hardin wrote:
> On Tue, 22 May 2018, John Hardin wrote:
> 
>> On Tue, 22 May 2018, Giovanni Bechis wrote:
>>
>>> On 05/18/18 00:53, John Hardin wrote:
>>> [...]
>>>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
>>>>
>>>>> SVN tagged rev in nightly_mass_check:  1831759
>>>>>
>>>>> New masscheck submission listings in the past day:
>>>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
>>>>
>>> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
>>
>> Note the source commit number in the first column. It *looks* like you're submitting your results about 20 hours *after* everyone else. The results from the *prior* masscheck are coming in about 3 hours before everybody else's result for the *current* masscheck.
> 
> This appears to still be happening. Can you review when your masschecks are starting/finishing?
> 
> 
It starts at 8:15 UTC and finishes before 8:30 UTC.
Yesterday's run seems to have worked.
 Giovanni

Re: giovanni and llanga masschecks?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 May 2018, John Hardin wrote:

> On Tue, 22 May 2018, Giovanni Bechis wrote:
>
>> On 05/18/18 00:53, John Hardin wrote:
>> [...]
>>> Yes, but that's not clear as to whether they are pulling the wrong rev or 
>>> are just late.
>>> 
>>>> SVN tagged rev in nightly_mass_check:  1831759
>>>> 
>>>> New masscheck submission listings in the past day:
>>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
>>> 
>> I takes half an hour to process my data and judging from those lines it 
>> seems that I am sending my data 3 hours before axb@, am I wrong ?
>
> Note the source commit number in the first column. It *looks* like you're 
> submitting your results about 20 hours *after* everyone else. The results 
> from the *prior* masscheck are coming in about 3 hours before everybody 
> else's result for the *current* masscheck.

This appears to still be happening. Can you review when your masschecks 
are starting/finishing?


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...much of our country's counterterrorism security spending is not
   designed to protect us from the terrorists, but instead to protect
   our public officials from criticism when another attack occurs.
                                                     -- Bruce Schneier
-----------------------------------------------------------------------
  Tomorrow: the 242nd anniversary of the Declaration of Independence

Re: giovanni and llanga masschecks?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 22 May 2018, Giovanni Bechis wrote:

> On 05/18/18 00:53, John Hardin wrote:
> [...]
>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
>>
>>> SVN tagged rev in nightly_mass_check:  1831759
>>>
>>> New masscheck submission listings in the past day:
>>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>   1831684  (No) - spam-llanga.log (May 17 08:45)
>>
> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?

Note the source commit number in the first column. It *looks* like you're 
submitting your results about 20 hours *after* everyone else. The results 
from the *prior* masscheck are coming in about 3 hours before everybody 
else's result for the *current* masscheck.


>
>> Based on the RuleQA daterev list (at the top of the page), 1831684 *does* appear to be a valid masscheck daterev (apologies for the textual "screenshot"):
>>
>>
>> 1831684: 2018-05-16 08:34:16
>> spamassassin_role: promotions validated
>>
>> 20180516-r1831684-n
>> axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week2 ena-week3 ena-week4 giovanni jarif jbrooks llanga mmiroslaw-mails-ham mmiroslaw-mails-spam sihde
>>
>> 1831684: 2018-05-16 08:34:16
>> spamassassin_role: promotions validated
>>
>> 20180517-r1831684-n
>> giovanni llanga
>>
>> 1831759: 2018-05-17 08:34:10
>> spamassassin_role: promotions validated
>>
>> 20180517-r1831759-n (Viewing)
>> axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week3 ena-week4 grenier jarif jbrooks mmiroslaw-mails-ham mmiroslaw-mails-spam sihde thendrikx
>>
>>
>> You don't see that normally because the default "last two" in the UI is usually the current submissions from everybody else, preceded by the apparently-late submission for the *prior* rdaterev from from giovanni and llanga. You have to hit "all daterevs within 2 days" to see more history.
>>
>>
>> It looks to me like their submissions are for the correct (prior) daterev (SVN commit) but are coming in ~20H late... I don't think we can tweak the cutoff *that* much. :)
>>
>> I would be surprised if their masschecks were taking that long to complete. Is it possible they have something like a TZ error causing that much of a discrepancy?
>>
>>
>

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You know things are bad when Pravda says we [the USA] have gone
   too far to the left.                                 -- Joe Huffman
-----------------------------------------------------------------------
  418 days since the first commercial re-flight of an orbital booster (SpaceX)

Re: giovanni and llanga masschecks?

Posted by Dave Jones <da...@apache.org>.
On 05/22/2018 04:17 AM, Giovanni Bechis wrote:
> On 05/18/18 00:53, John Hardin wrote:
> [...]
>> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
>>
>>> SVN tagged rev in nightly_mass_check:  1831759
>>>
>>> New masscheck submission listings in the past day:
>>>    1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>>    1831684  (No) - ham-giovanni.log (May 17 08:38)
>>>    1831684  (No) - spam-llanga.log (May 17 08:45)
>>
> I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
> 

The first column is the SVN tagged revision that your masscheck is 
running against.  The "No" means it's not matching the currently staged 
rsync directory that the masscheck script pulls from.

I think you are running your masscheck too early against the previous 
SVN tagged revision.  I am running mine at 9:05 AM UTC.  Those times 
above look like yours is running an hour too early.

Dave

Re: giovanni and llanga masschecks?

Posted by Giovanni Bechis <gi...@paclan.it>.
On 05/18/18 00:53, John Hardin wrote:
[...]
> Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.
> 
>> SVN tagged rev in nightly_mass_check:  1831759
>>
>> New masscheck submission listings in the past day:
>>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>>   1831684  (No) - spam-llanga.log (May 17 08:45)
> 
I takes half an hour to process my data and judging from those lines it seems that I am sending my data 3 hours before axb@, am I wrong ?
 

> Based on the RuleQA daterev list (at the top of the page), 1831684 *does* appear to be a valid masscheck daterev (apologies for the textual "screenshot"):
> 
> 
> 1831684: 2018-05-16 08:34:16
> spamassassin_role: promotions validated
> 
> 20180516-r1831684-n
> axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week2 ena-week3 ena-week4 giovanni jarif jbrooks llanga mmiroslaw-mails-ham mmiroslaw-mails-spam sihde
> 
> 1831684: 2018-05-16 08:34:16
> spamassassin_role: promotions validated
> 
> 20180517-r1831684-n
> giovanni llanga
> 
> 1831759: 2018-05-17 08:34:10
> spamassassin_role: promotions validated
> 
> 20180517-r1831759-n (Viewing)
> axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week3 ena-week4 grenier jarif jbrooks mmiroslaw-mails-ham mmiroslaw-mails-spam sihde thendrikx
> 
> 
> You don't see that normally because the default "last two" in the UI is usually the current submissions from everybody else, preceded by the apparently-late submission for the *prior* rdaterev from from giovanni and llanga. You have to hit "all daterevs within 2 days" to see more history.
> 
> 
> It looks to me like their submissions are for the correct (prior) daterev (SVN commit) but are coming in ~20H late... I don't think we can tweak the cutoff *that* much. :)
> 
> I would be surprised if their masschecks were taking that long to complete. Is it possible they have something like a TZ error causing that much of a discrepancy?
> 
> 


Re: giovanni and llanga masschecks?

Posted by John Hardin <jh...@impsec.org>.
On Thu, 17 May 2018, Dave Jones wrote:

> On 05/17/2018 04:09 PM, John Hardin wrote:
>> 
>> I notice from the RuleQA website that the masschecks from giovanni and 
>> llanga are consistently reported separately from everybody else's.
>> 
>> I wonder whether this is affecting the quality of masscheck - is this 
>> perhaps causing it to bounce back and forth between scores (or do something 
>> else that's suboptimal) based on what appears to it to be two separate and 
>> different masscheck corpora?
>
> It looks like they are always behind on the SVN revision pulled down so they 
> are actually not being counted/included in the masscheck processing.
>
> The ena-week* corpora is the majority of the masscheck data.
>
>> Is this because of how those masschecks are being run or submitted, or is 
>> the masscheck infrastructure too strict (filename matching, submission 
>> cutoffs, etc.)?
>
> Maybe they are running old versions of the automasscheck script that has some 
> sort of delay in it between the downloading of the SVN staging area and the 
> masscheck local processing.  I understand that some may want to delay 
> processing until electricity costs.  The downloading of the staging area 
> needs to happen per the documentation to get the correct SVN version of rules 
> else it's a waste of resources.
>
>> It *feels* like those result sets are not coming in by the cutoff.
>
> It's the SVN revision that is making them show up behind in their own 
> section.

The RuleQA UI reports the same DateRev (SVN revision) as everybody else. 
See below.

>> Unfortunately the ruleQA website doesn't expose a tool to report when the 
>> results were submitted, just which DateRev the masscheck was based on.
>> 
>> I'd say "perhaps we need to extend the cutoff a bit", but I have no idea 
>> ATM when those result sets are coming in so I have no idea how the cutoff 
>> would be adjusted.
>
> I have a script that runs on sa-vm1 to track submissions so the sysadmins 
> list gets notifications when we don't have enough for masscheck to run. 
> Here's the output right now -- note the No's that don't match the SVN tagged 
> rev:

Yes, but that's not clear as to whether they are pulling the wrong rev or 
are just late.

> SVN tagged rev in nightly_mass_check:  1831759
>
> New masscheck submission listings in the past day:
>   1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
>   1831684  (No) - ham-giovanni.log (May 17 08:38)
>   1831684  (No) - spam-llanga.log (May 17 08:45)

Based on the RuleQA daterev list (at the top of the page), 1831684 *does* 
appear to be a valid masscheck daterev (apologies for the textual 
"screenshot"):


1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated

20180516-r1831684-n
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 
ena-week2 ena-week3 ena-week4 giovanni jarif jbrooks llanga 
mmiroslaw-mails-ham mmiroslaw-mails-spam sihde

1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated

20180517-r1831684-n
giovanni llanga

1831759: 2018-05-17 08:34:10
spamassassin_role: promotions validated

20180517-r1831759-n (Viewing)
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 
ena-week3 ena-week4 grenier jarif jbrooks mmiroslaw-mails-ham 
mmiroslaw-mails-spam sihde thendrikx


You don't see that normally because the default "last two" in the UI is 
usually the current submissions from everybody else, preceded by the 
apparently-late submission for the *prior* rdaterev from from giovanni and 
llanga. You have to hit "all daterevs within 2 days" to see more history.


It looks to me like their submissions are for the correct (prior) daterev 
(SVN commit) but are coming in ~20H late... I don't think we can tweak the 
cutoff *that* much. :)

I would be surprised if their masschecks were taking that long to 
complete. Is it possible they have something like a TZ error causing that 
much of a discrepancy?


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It's easy to be noble with other people's money.
                                    -- John McKay, _The Welfare State:
                                       No Mercy for the Middle Class_
-----------------------------------------------------------------------
  413 days since the first commercial re-flight of an orbital booster (SpaceX)

Re: giovanni and llanga masschecks?

Posted by Dave Jones <da...@apache.org>.
On 05/17/2018 04:09 PM, John Hardin wrote:
> 
> I notice from the RuleQA website that the masschecks from giovanni and 
> llanga are consistently reported separately from everybody else's.
> 
> I wonder whether this is affecting the quality of masscheck - is this 
> perhaps causing it to bounce back and forth between scores (or do 
> something else that's suboptimal) based on what appears to it to be two 
> separate and different masscheck corpora?
> 

It looks like they are always behind on the SVN revision pulled down so 
they are actually not being counted/included in the masscheck processing.

The ena-week* corpora is the majority of the masscheck data.

> Is this because of how those masschecks are being run or submitted, or 
> is the masscheck infrastructure too strict (filename matching, 
> submission cutoffs, etc.)?
> 

Maybe they are running old versions of the automasscheck script that has 
some sort of delay in it between the downloading of the SVN staging area 
and the masscheck local processing.  I understand that some may want to 
delay processing until electricity costs.  The downloading of the 
staging area needs to happen per the documentation to get the correct 
SVN version of rules else it's a waste of resources.


> It *feels* like those result sets are not coming in by the cutoff.
> 

It's the SVN revision that is making them show up behind in their own 
section.

> Unfortunately the ruleQA website doesn't expose a tool to report when 
> the results were submitted, just which DateRev the masscheck was based on.
> 
> I'd say "perhaps we need to extend the cutoff a bit", but I have no idea 
> ATM when those result sets are coming in so I have no idea how the 
> cutoff would be adjusted.
> 

I have a script that runs on sa-vm1 to track submissions so the 
sysadmins list gets notifications when we don't have enough for 
masscheck to run.  Here's the output right now -- note the No's that 
don't match the SVN tagged rev:


Corpus total: 162, Old: 86, Recent: 38, New: 38

SVN tagged rev in nightly_mass_check:  1831759

New masscheck submission listings in the past day:
    SVN rev (Match) File Name (Date)
    1831759 (Yes) - spam-darxus.log (May 17 09:09)
    1831759 (Yes) - ham-darxus.log (May 17 09:08)
    1831759 (Yes) - ham-grenier.log (May 17 09:03)
    1831759 (Yes) - ham-ena-week0.log (May 17 10:42)
    1831759 (Yes) - spam-ena-week0.log (May 17 10:42)
    1831759 (Yes) - spam-ena-week3.log (May 17 11:14)
    1831759 (Yes) - spam-ena-week1.log (May 17 11:10)
    1831759 (Yes) - spam-jbrooks.log (May 17 10:06)
    1831759 (Yes) - spam-axb-generic.log (May 17 11:35)
    1831759 (Yes) - ham-mmiroslaw-mails-ham.log (May 17 11:11)
    1831759 (Yes) - ham-ena-week2.log (May 17 12:13)
    1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
    1831684  (No) - ham-giovanni.log (May 17 08:38)
    1831684  (No) - spam-llanga.log (May 17 08:45)
    1831759 (Yes) - spam-grenier.log (May 17 09:03)
    1831759 (Yes) - ham-axb-ham-misc.log (May 17 11:34)
    1831759 (Yes) - ham-mmiroslaw-mails-spam.log (May 17 11:11)
    1831759 (Yes) - ham-axb-generic.log (May 17 11:34)
    1831759 (Yes) - ham-axb-ninja.log (May 17 11:34)
    1831759 (Yes) - spam-ena-week4.log (May 17 12:01)
    1831759 (Yes) - spam-mmiroslaw-mails-spam.log (May 17 11:11)
    1831759 (Yes) - spam-axb-ninja.log (May 17 11:35)
    1831759 (Yes) - ham-jbrooks.log (May 17 10:06)
    1831759 (Yes) - spam-jarif.log (May 17 09:28)
    1831759 (Yes) - ham-thendrikx.log (May 17 09:03)
    1831759 (Yes) - ham-jarif.log (May 17 09:28)
    1831759 (Yes) - spam-mmiroslaw-mails-ham.log (May 17 11:11)
    1831684  (No) - ham-llanga.log (May 17 08:45)
    1831759 (Yes) - spam-axb-coi-bulk.log (May 17 11:34)
    1831759 (Yes) - ham-ena-week3.log (May 17 11:14)
    1831759 (Yes) - ham-ena-week1.log (May 17 11:10)
    1831759 (Yes) - spam-thendrikx.log (May 17 09:03)
    1831759 (Yes) - ham-sihde.log (May 17 10:11)
    1831759 (Yes) - spam-sihde.log (May 17 10:11)
    1831759 (Yes) - ham-ena-week4.log (May 17 12:01)
    1831684  (No) - spam-giovanni.log (May 17 08:38)
    1831759 (Yes) - ham-axb-coi-bulk.log (May 17 11:34)
    1831759 (Yes) - spam-ena-week2.log (May 17 12:13)

34/38 matches (17 ham, 17 spam)

Recent masscheck submission listings in the past week:
    SVN rev (Match) File Name (Date)
    1831456 (Yes) - spam-net-ena-week0.log (May 12 16:22)
    1831456 (Yes) - spam-net-axb-ninja.log (May 12 12:10)
    1831456 (Yes) - spam-net-grenier.log (May 12 09:05)
    1831456 (Yes) - spam-net-jbrooks.log (May 12 10:44)
    1831456 (Yes) - ham-net-jarif.log (May 12 09:29)
    1831456 (Yes) - spam-net-thendrikx.log (May 12 09:29)
    1831456 (Yes) - ham-net-ena-week2.log (May 12 15:34)
    1831456 (Yes) - ham-net-ena-week1.log (May 12 12:42)
    1831456 (Yes) - spam-net-ena-week4.log (May 12 14:06)
    1831456 (Yes) - spam-net-sihde.log (May 12 11:13)
    1831456 (Yes) - ham-net-axb-ham-misc.log (May 12 12:10)
    1831456 (Yes) - spam-net-ena-week3.log (May 12 15:52)
    1831456 (Yes) - ham-net-axb-coi-bulk.log (May 12 12:10)
    1830957  (No) - spam-net-llanga.log (May 12 19:55)
    1831456 (Yes) - spam-net-axb-generic.log (May 12 12:10)
    1831456 (Yes) - ham-net-axb-ninja.log (May 12 12:10)
    1831456 (Yes) - ham-net-ena-week4.log (May 12 14:05)
    1830957  (No) - spam-net-giovanni.log (May 12 08:44)
    1831456 (Yes) - ham-net-grenier.log (May 12 09:05)
    1831456 (Yes) - ham-net-jbrooks.log (May 12 10:44)
    1831456 (Yes) - spam-net-mmiroslaw-mails-ham.log (May 12 11:42)
    1831456 (Yes) - ham-net-ena-week3.log (May 12 15:52)
    1831456 (Yes) - ham-net-mmiroslaw-mails-spam.log (May 12 11:42)
    1831456 (Yes) - ham-net-sihde.log (May 12 11:13)
    1830957  (No) - ham-net-llanga.log (May 12 19:55)
    1831456 (Yes) - ham-net-mmiroslaw-mails-ham.log (May 12 11:42)
    1830957  (No) - ham-net-giovanni.log (May 12 08:44)
    1831456 (Yes) - spam-net-ena-week2.log (May 12 15:35)
    1831456 (Yes) - ham-net-ena-week0.log (May 12 16:22)
    1831456 (Yes) - spam-net-darxus.log (May 12 10:32)
    1831456 (Yes) - spam-net-axb-ham-misc.log (May 12 12:10)
    1831456 (Yes) - ham-net-darxus.log (May 12 10:32)
    1831456 (Yes) - ham-net-thendrikx.log (May 12 09:29)
    1831456 (Yes) - spam-net-jarif.log (May 12 09:29)
    1831456 (Yes) - spam-net-axb-coi-bulk.log (May 12 12:10)
    1831456 (Yes) - spam-net-ena-week1.log (May 12 12:43)
    1831456 (Yes) - spam-net-mmiroslaw-mails-spam.log (May 12 11:42)
    1831456 (Yes) - ham-net-axb-generic.log (May 12 12:10)

34/38 matches (17 ham, 17 spam)

Re: giovanni and llanga masschecks?

Posted by Giovanni Bechis <gi...@paclan.it>.
On 05/17/18 23:09, John Hardin wrote:
> 
> I notice from the RuleQA website that the masschecks from giovanni and llanga are consistently reported separately from everybody else's.
> 
> I wonder whether this is affecting the quality of masscheck - is this perhaps causing it to bounce back and forth between scores (or do something else that's suboptimal) based on what appears to it to be two separate and different masscheck corpora?
> 
> Is this because of how those masschecks are being run or submitted, or is the masscheck infrastructure too strict (filename matching, submission cutoffs, etc.)?
> 
> It *feels* like those result sets are not coming in by the cutoff.
> 
I anticipated a bit my masscheck crontab.

> Unfortunately the ruleQA website doesn't expose a tool to report when the results were submitted, just which DateRev the masscheck was based on.
> 
> I'd say "perhaps we need to extend the cutoff a bit", but I have no idea ATM when those result sets are coming in so I have no idea how the cutoff would be adjusted.
> 

Cheers
  Giovanni