You are viewing a plain text version of this content. The canonical link for it is here.
Posted to sysadmins@spamassassin.apache.org by Sidney Markowitz <si...@sidney.com> on 2022/04/19 03:48:34 UTC

bug 7676 problems with reusing file names in the two daily mkupdate runs

Hey everyone,

I was going over open bugs in order of priority/severity setting and 
noticed bug 7676
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7676

I think I understand how we do things now, but not why we do it.

First of all, can someone explain why there are two separate runs? If 
the two versions have different uses, how does anyone choose which one 
they will use? If one is "better" than the other, why not publish just 
that one?

Also, is there a reason not to use a suffix on the svn rev number to 
distinguish the two daily runs, updating the DNS txt records twice a day 
instead of once?? That way there would be no caching problems. Would it 
require anything else to be changed that depends on the number being the 
exact svn revision?

Thanks,

  Sidney

Re: bug 7676 problems with reusing file names in the two daily mkupdate runs

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 19, 2022 at 09:36:44AM +0300, Henrik K wrote:
> On Tue, Apr 19, 2022 at 08:23:53AM +0300, Henrik K wrote:
> > On Tue, Apr 19, 2022 at 03:48:34PM +1200, Sidney Markowitz wrote:
> > > Hey everyone,
> > > 
> > > I was going over open bugs in order of priority/severity setting and noticed
> > > bug 7676
> > > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7676
> > > 
> > > I think I understand how we do things now, but not why we do it.
> > > 
> > > First of all, can someone explain why there are two separate runs? If the
> > > two versions have different uses, how does anyone choose which one they will
> > > use? If one is "better" than the other, why not publish just that one?
> > > 
> > > Also, is there a reason not to use a suffix on the svn rev number to
> > > distinguish the two daily runs, updating the DNS txt records twice a day
> > > instead of once?? That way there would be no caching problems. Would it
> > > require anything else to be changed that depends on the number being the
> > > exact svn revision?
> > 
> > Pretty bizarre that mkupdate-with-scores and run_nightly do the same things
> > creating tarballs, but test with different versions etc.  Seems pretty clear
> > to me that only one of them should do the tarball.  A timeline of script
> > actions need to be created to analyze this.
> > 
> > Who's even on this list?  Probably should continue on the bug for wider
> > committer audience.
> 
> Well, I closed this bug since it was not an issue as is.
> 
> But there should be some code cleanups.  It seems run_nightly wastes time
> creating tarballs for no purpose at all.

Committed some fixes.

- Disable possible run_nightly tarball creation, mkupdate-with-scores already does it more reliably
- Update tarball lint test much succeed for ALL versions (3.4.1-3.4.6 currently tested)

Disabling run_nightly make_tarball_for_version seems to have only one effect
from what I see, "sa-update_3.4.4_20220418083116" style SVN tags are no
longer created.  There seems to be no real purpose for those anyway.


Re: bug 7676 problems with reusing file names in the two daily mkupdate runs

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 19, 2022 at 08:23:53AM +0300, Henrik K wrote:
> On Tue, Apr 19, 2022 at 03:48:34PM +1200, Sidney Markowitz wrote:
> > Hey everyone,
> > 
> > I was going over open bugs in order of priority/severity setting and noticed
> > bug 7676
> > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7676
> > 
> > I think I understand how we do things now, but not why we do it.
> > 
> > First of all, can someone explain why there are two separate runs? If the
> > two versions have different uses, how does anyone choose which one they will
> > use? If one is "better" than the other, why not publish just that one?
> > 
> > Also, is there a reason not to use a suffix on the svn rev number to
> > distinguish the two daily runs, updating the DNS txt records twice a day
> > instead of once?? That way there would be no caching problems. Would it
> > require anything else to be changed that depends on the number being the
> > exact svn revision?
> 
> Pretty bizarre that mkupdate-with-scores and run_nightly do the same things
> creating tarballs, but test with different versions etc.  Seems pretty clear
> to me that only one of them should do the tarball.  A timeline of script
> actions need to be created to analyze this.
> 
> Who's even on this list?  Probably should continue on the bug for wider
> committer audience.

Well, I closed this bug since it was not an issue as is.

But there should be some code cleanups.  It seems run_nightly wastes time
creating tarballs for no purpose at all.


Re: bug 7676 problems with reusing file names in the two daily mkupdate runs

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 19, 2022 at 02:25:32PM -0400, Bill Cole wrote:
> 
> When I first encountered this behavior, I concluded that both runs needed to
> happen, as the first run worked out which rules were good enough to include
> and the second run used that set to generate the new scores. I'd obviously
> like to know if that understanding is incorrect, as I've passed that lore
> along to others noting the double-generation.

As noted in Bug 7676, the second run update.tar.gz is always ignored and
nothing is done with it.

The only thing run_nightly really does is promote_active_rules() and commit
new rules/active.list.  I'm not sure I understand why this is done
separately in run_nightly hours after the main generation.


Re: bug 7676 problems with reusing file names in the two daily mkupdate runs

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2022-04-19 at 01:23:53 UTC-0400 (Tue, 19 Apr 2022 08:23:53 +0300)
Henrik K <sy...@spamassassin.apache.org>
is rumored to have said:

> On Tue, Apr 19, 2022 at 03:48:34PM +1200, Sidney Markowitz wrote:
>> Hey everyone,
>>
>> I was going over open bugs in order of priority/severity setting and 
>> noticed
>> bug 7676
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7676
>>
>> I think I understand how we do things now, but not why we do it.
>>
>> First of all, can someone explain why there are two separate runs? If 
>> the
>> two versions have different uses, how does anyone choose which one 
>> they will
>> use? If one is "better" than the other, why not publish just that 
>> one?
>>
>> Also, is there a reason not to use a suffix on the svn rev number to
>> distinguish the two daily runs, updating the DNS txt records twice a 
>> day
>> instead of once?? That way there would be no caching problems. Would 
>> it
>> require anything else to be changed that depends on the number being 
>> the
>> exact svn revision?
>
> Pretty bizarre that mkupdate-with-scores and run_nightly do the same 
> things
> creating tarballs, but test with different versions etc.  Seems pretty 
> clear
> to me that only one of them should do the tarball.  A timeline of 
> script
> actions need to be created to analyze this.

When I first encountered this behavior, I concluded that both runs 
needed to happen, as the first run worked out which rules were good 
enough to include and the second run used that set to generate the new 
scores. I'd obviously like to know if that understanding is incorrect, 
as I've passed that lore along to others noting the double-generation.

> Who's even on this list?  Probably should continue on the bug for 
> wider
> committer audience.

My understanding is that this list includes every committer who has 
tried to work on the services related to rule QA and updates.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: bug 7676 problems with reusing file names in the two daily mkupdate runs

Posted by Henrik K <he...@hege.li>.
On Tue, Apr 19, 2022 at 03:48:34PM +1200, Sidney Markowitz wrote:
> Hey everyone,
> 
> I was going over open bugs in order of priority/severity setting and noticed
> bug 7676
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7676
> 
> I think I understand how we do things now, but not why we do it.
> 
> First of all, can someone explain why there are two separate runs? If the
> two versions have different uses, how does anyone choose which one they will
> use? If one is "better" than the other, why not publish just that one?
> 
> Also, is there a reason not to use a suffix on the svn rev number to
> distinguish the two daily runs, updating the DNS txt records twice a day
> instead of once?? That way there would be no caching problems. Would it
> require anything else to be changed that depends on the number being the
> exact svn revision?

Pretty bizarre that mkupdate-with-scores and run_nightly do the same things
creating tarballs, but test with different versions etc.  Seems pretty clear
to me that only one of them should do the tarball.  A timeline of script
actions need to be created to analyze this.

Who's even on this list?  Probably should continue on the bug for wider
committer audience.