You are viewing a plain text version of this content. The canonical link for it is here.
Posted to sysadmins@spamassassin.apache.org by "Kevin A. McGrail" <ke...@mcgrail.com> on 2017/11/02 12:27:02 UTC

Re: Eureka: truncation of 72_active.cf

+sysadmins@s.a.o so we don't lose these conversations.

Consider asking gstein@apache.org for his help on SVN.  I simply don't 
play with enough tags, branches, revisions, etc.

Either there is a bug and we are injecting the wrong version, or you are 
just looking at things incorrectly.

However, I think you perhaps are just doing something wrong??

This command, for example, appears to pull an SA rule set.  Isn't that 
the expected behavior?

svn co -r 1813595 http://svn.apache.org/repos/asf/spamassassin/trunk 
trunk-new-rules-set1  | more
A    trunk-new-rules-set1/rulesrc
A    trunk-new-rules-set1/rulesrc/10_force_active.cf
A    trunk-new-rules-set1/rulesrc/sandbox
A    trunk-new-rules-set1/rulesrc/sandbox/jhardin
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_MIME_no_text.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_misc_testing.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_lotsa_money.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_MIME_in_body.cf

What do you get?

Regards,
KAM

On 11/2/2017 5:29 AM, Merijn van den Kroonenberg wrote:
> I am a bit confused about corpus revision
>
> Take for example this:
>
> Revision: 1813664
> Author: spamassassin_role
> Date: zondag 29 oktober 2017 3:47:00
> Message:
> updated scores for revision 1813595 active rules added since last 
> mass-check
>
> And in the sysadmin mail from last night (rescore example)
>
> svn co -r 1813595 http://svn.apache.org/repos/asf/spamassassin/trunk 
> trunk-new-rules-set1
>
> So the commit mentions revision 1813595, I assume its also mentioned 
> in corpus logs and its actually checked out.
>
> BUT 1813595 is no valid revision for the spamassassin project??
>
> Its actually a revision in another apcache project.
>
> Revision: 1813595
> Author: deepak
> Date: zaterdag 28 oktober 2017 10:33:50
> Message:
> Improved: Add rat exclude files to excludes those files that does not 
> need
> license header
> (OFBIZ-9856)
>
> Updated rat-excludes.txt file
> ----
> Modified : /ofbiz/ofbiz-framework/trunk/rat-excludes.txt
>
> So is somewhere something wrong with detecting the correct revision?
>
>
> -----Original Message----- From: Kevin A. McGrail
> Sent: Wednesday, November 01, 2017 10:49 PM
> To: David Jones ; Merijn van den Kroonenberg
> Subject: Re: Eureka: truncation of 72_active.cf
>
> On 11/1/2017 5:31 PM, David Jones wrote:
>>
>> I found another bug in the DKIM_VALID_EF rule description that needed 
>> to be wrapped in a version check that was causing the ruleset 
>> validation to fail with return code 4.  Just committed another fix 
>> that should take care of this.
>>
>>
>> The timing of the rule promotions and the masscheck validation, this 
>> could take another ~40 hours to work itself out.  I really want to 
>> improve the way things work to speed up this cycle time.
>>
> Agreed.  I am very excited to finally have this done though.



Re: Eureka: truncation of 72_active.cf

Posted by Dave Jones <da...@apache.org>.
Well crap.  I found another odd dependency that throws off the masscheck 
processing thrown off when commits are done outside the few hour window:

https://wiki.apache.org/spamassassin/InfraNotes2017#nitemc

automc@sa-vm1:~$ ~/svn/trunk/build/mkupdates/run_nightly
+ promote_active_rules
+ pwd
/usr/local/spamassassin/automc/svn/trunk
+ /usr/bin/perl build/mkupdates/listpromotable
HTTP get: http://ruleqa.spamassassin.org/1-days-ago?xml=1
no 'mcviewing', 'mcsubmitters' microformats on day 1
URL: http://ruleqa.spamassassin.org/1-days-ago?xml=1
+ exit 25

This is a critical cron job that sets up the new rule promotions daily. 
I will dig into this one deeper but it seems that if the masscheck SA 
revision gets out of sync with new commits that may not even be 
ruleset-related, then days have to pass with no commits before the stars 
will align properly again.  Geez.  What a mess with this workflow!

I think we need to carefully document the current SVN workflow and 
redesign it to handle this better.  The masscheck processing of rulesets 
only needs to be tied to the ruleset revision staged in the rsync area 
for the current 24 hour period.  Maybe we need a new masscheck-specific 
tag separate from the rule promotion tags today?

Rule promotions can happen at any time, multiple times a day as long as 
they pass a lint check.

Masscheck validation are currently only done once a day.

My intial thoughts are parts of the scripts are using the latest SVN 
revision and part are using the latest tagged revision from rule 
promotions.  When these get out of sync, we don't get enough 
masschecking of the proper revision to keep moving everything forward.

Dave


On 11/02/2017 07:46 AM, Kevin A. McGrail wrote:
> On 11/2/2017 8:40 AM, Merijn van den Kroonenberg wrote:
>> The checkout works as you would expect. But its just very confusing a 
>> revision is used which is not inside the spamassassin project.
>> It might also cause side effects in other part of the process as David 
>> mentioned. But the check out part is not actually broken.
>>
>> I do have some considerable experience with subversion....the problem 
>> is more what the intention of the code should be ;)
> 
> My recommendation is do not try and unravel the thought process behind 
> the code.  Stay focused on the goal which is to produce rules and 
> distribute them.  Anything you do towards that goal is good.  If we 
> break some eggs to make some omelets, great.
> 
> Ideally, I would like to publish more daily rulesets, focus on 
> optimization, etc.
> 
> Regards,
> 
> KAM
> 


Re: Eureka: truncation of 72_active.cf

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.
On 11/2/2017 8:40 AM, Merijn van den Kroonenberg wrote:
> The checkout works as you would expect. But its just very confusing a 
> revision is used which is not inside the spamassassin project.
> It might also cause side effects in other part of the process as David 
> mentioned. But the check out part is not actually broken.
>
> I do have some considerable experience with subversion....the problem 
> is more what the intention of the code should be ;)

My recommendation is do not try and unravel the thought process behind 
the code.  Stay focused on the goal which is to produce rules and 
distribute them.  Anything you do towards that goal is good.  If we 
break some eggs to make some omelets, great.

Ideally, I would like to publish more daily rulesets, focus on 
optimization, etc.

Regards,

KAM


Re: Eureka: truncation of 72_active.cf

Posted by Merijn van den Kroonenberg <me...@web2all.nl>.
The checkout works as you would expect. But its just very confusing a 
revision is used which is not inside the spamassassin project.
It might also cause side effects in other part of the process as David 
mentioned. But the check out part is not actually broken.

I do have some considerable experience with subversion....the problem is 
more what the intention of the code should be ;)

-----Original Message----- 
From: Kevin A. McGrail
Sent: Thursday, November 02, 2017 1:27 PM
To: Merijn van den Kroonenberg ; David Jones ; 
sysadmins@spamassassin.apache.org
Subject: Re: Eureka: truncation of 72_active.cf

+sysadmins@s.a.o so we don't lose these conversations.

Consider asking gstein@apache.org for his help on SVN.  I simply don't
play with enough tags, branches, revisions, etc.

Either there is a bug and we are injecting the wrong version, or you are
just looking at things incorrectly.

However, I think you perhaps are just doing something wrong??

This command, for example, appears to pull an SA rule set.  Isn't that
the expected behavior?

svn co -r 1813595 http://svn.apache.org/repos/asf/spamassassin/trunk
trunk-new-rules-set1  | more
A    trunk-new-rules-set1/rulesrc
A    trunk-new-rules-set1/rulesrc/10_force_active.cf
A    trunk-new-rules-set1/rulesrc/sandbox
A    trunk-new-rules-set1/rulesrc/sandbox/jhardin
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_MIME_no_text.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_misc_testing.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_lotsa_money.cf
A trunk-new-rules-set1/rulesrc/sandbox/jhardin/20_MIME_in_body.cf

What do you get?

Regards,
KAM

On 11/2/2017 5:29 AM, Merijn van den Kroonenberg wrote:
> I am a bit confused about corpus revision
>
> Take for example this:
>
> Revision: 1813664
> Author: spamassassin_role
> Date: zondag 29 oktober 2017 3:47:00
> Message:
> updated scores for revision 1813595 active rules added since last 
> mass-check
>
> And in the sysadmin mail from last night (rescore example)
>
> svn co -r 1813595 http://svn.apache.org/repos/asf/spamassassin/trunk 
> trunk-new-rules-set1
>
> So the commit mentions revision 1813595, I assume its also mentioned in 
> corpus logs and its actually checked out.
>
> BUT 1813595 is no valid revision for the spamassassin project??
>
> Its actually a revision in another apcache project.
>
> Revision: 1813595
> Author: deepak
> Date: zaterdag 28 oktober 2017 10:33:50
> Message:
> Improved: Add rat exclude files to excludes those files that does not need
> license header
> (OFBIZ-9856)
>
> Updated rat-excludes.txt file
> ----
> Modified : /ofbiz/ofbiz-framework/trunk/rat-excludes.txt
>
> So is somewhere something wrong with detecting the correct revision?
>
>
> -----Original Message----- From: Kevin A. McGrail
> Sent: Wednesday, November 01, 2017 10:49 PM
> To: David Jones ; Merijn van den Kroonenberg
> Subject: Re: Eureka: truncation of 72_active.cf
>
> On 11/1/2017 5:31 PM, David Jones wrote:
>>
>> I found another bug in the DKIM_VALID_EF rule description that needed to 
>> be wrapped in a version check that was causing the ruleset validation to 
>> fail with return code 4.  Just committed another fix that should take 
>> care of this.
>>
>>
>> The timing of the rule promotions and the masscheck validation, this 
>> could take another ~40 hours to work itself out.  I really want to 
>> improve the way things work to speed up this cycle time.
>>
> Agreed.  I am very excited to finally have this done though.