You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Olivier Coutu <ol...@zerospam.ca> on 2019/01/08 15:31:04 UTC

Patch to reduce lint time

We use a lot of custom rules (2k+) and lint time was growing out of 
control, over a minute on some machines.

It appears like /$alreadydone/, the set of rules that have already been 
linted, is reset for every test. This makes lint time quadratic in the 
number of tests and does not seem to have any use. Moving it out of the 
tests loop reduces my lint time from 21 seconds to 3 seconds with no 
visible side effect.

This patch moves it out of the tests loop

--- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm	2019-01-07 17:54:19.244624440 -0500
+++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm	2019-01-07 17:55:11.948708724 -0500
@@ -964,13 +964,13 @@
    my ($self) = @_;
    my $conf = $self->{conf};
    $conf->{meta_dependencies} = { };
+  my $alreadydone = { };
  
    foreach my $name (keys %{$conf->{tests}}) {
      next unless ($conf->{test_types}->{$name}
                      == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
  
      my $deps = [ ];
-    my $alreadydone = { };
      $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
      $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
    }

The result on ok rulesets and on obviously broken rulesets does not 
change from pre-patch, and I can't see any negative side effects of 
moving it out of the loop.

I'm not used to posting on dev lists, so please tell me if more 
information is needed or if things should be presented differently.

-Olivier

Re: Patch to reduce lint time

Posted by "Kevin A. McGrail" <km...@apache.org>.
Hi Olivier, I'm focusing my SA time on getting 3.4.3 done but wanted to
mentioned that this patch is getting into non-trivial territory.  Could you
please confirm or file an ICLA?  It's awesome you are looking at
improvements and this helps us consider them freely:
https://www.apache.org/licenses/icla.pdf

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Fri, Feb 8, 2019 at 1:34 PM Olivier Coutu <ol...@zerospam.ca>
wrote:

> You were quite right Henrick, I had insufficiently tested it and while it
> computed dependencies only once, it did not properly retrieve previously
> computed dependencies.
>
> I updated the patch:
>
> --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 15:01:16.124532948 -0500
> +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 15:03:41.127692566 -0500
> @@ -968,24 +968,29 @@
>    foreach my $name (keys %{$conf->{tests}}) {
>      next unless ($conf->{test_types}->{$name}
>                      == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
> -
> -    my $deps = [ ];
> -    my $alreadydone = { };
> -    $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
> -    $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
> +    my $alreadydone = {};
> +    $self->_meta_deps_recurse($conf, $name, $name, $alreadydone);
>    }
>  }
>
>  sub _meta_deps_recurse {
> -  my ($self, $conf, $toprule, $name, $deps, $alreadydone) = @_;
> +  my ($self, $conf, $toprule, $name, $alreadydone) = @_;
>
> -  # Only do each rule once per top-level meta; avoid infinite recursion
> -  return if $alreadydone->{$name};
> -  $alreadydone->{$name} = 1;
> +  # Avoid recomputing the dependencies of a rule
> +  return split(' ', $conf->{meta_dependencies}->{$name}) if defined $conf->{meta_dependencies}->{$name};
>
>    # Obviously, don't trace empty or nonexistent rules
>    my $rule = $conf->{tests}->{$name};
> -  return unless $rule;
> +  unless ($rule) {
> +      $conf->{meta_dependencies}->{$name} = '';
> +      return ( );
> +  }
> +
> +  # Avoid infinite recursion
> +  return ( ) if exists $alreadydone->{$name};
> +  $alreadydone->{$name} = ( );
> +
> +  my %deps;
>
>    # Lex the rule into tokens using a rather simple RE method ...
>    my $lexer = ARITH_EXPRESSION_LEXER;
> @@ -1001,9 +1006,12 @@
>      next unless exists $conf_tests->{$token};
>
>      # add and recurse
> -    push(@{$deps}, untaint_var($token));
> -    $self->_meta_deps_recurse($conf, $toprule, $token, $deps, $alreadydone);
> +    $deps{untaint_var($token)} = ( );
> +    my @subdeps = $self->_meta_deps_recurse($conf, $toprule, $token, $alreadydone);
> +    @deps{@subdeps} = ( );
>    }
> +  $conf->{meta_dependencies}->{$name} = join (' ', keys %deps);
> +  return keys %deps;
>  }
>
>  sub fix_priorities {
>
>
> *_meta_deps_recurse* now returns a hash that acts as a set of
> dependencies.
>
> These dependencies are stored in *$conf->{meta_dependencies}->{$name}* as
> they are calculated.
>
> The result is not exactly the same as it was before, as I had some cases
> where duplicates in *$conf->{meta_dependencies}->{$name}* would arise.
> This is no longer the case. The sets of dependencies stay identical.
>
> I also considered changing *$conf->{meta_dependencies}->{$name}* to a
> list instead of a string, but that was not necessary to achieve a 4-fold
> performance improvement and would require additional refactoring.
>
> What do you think? I am not used to writing Perl code so feel free to
> correct the style.
> On 19-01-08 11 h 24, Kevin A. McGrail wrote:
>
> Thanks Henrik!
>
> On 1/8/2019 11:21 AM, Henrik K wrote:
>
> The code is saving dependencies of each meta rule in this variable:
>   $conf->{meta_dependencies}
>
> That variable is used in Check.pm meta tests.
>
> Simply adding debug print after this line:
>   $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>   print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";
>
> And running both versions with spamassassin --lint 2>/tmp/log1 and log2
>
> Shows for example that original version:
>
> DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05
>
> New "faster" version:
>
> DEPS FOR __YOU_WON:
>
> So it appears __YOU_WON has no dependencies, thus the patch does not work as
> intended, Check.pm fails to check for dependencies, -1 from me.
>
> Contributions are very much appreciated, but I urge everyone to use very
> simple debug methods like this to verify even "trivial" things. :-)
>
> Cheers,
> Henrik
>
> On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
>
> It's an interesting issue and a trivial patch.  I'm sure it was done as a
> safety valve figuring the cycles weren't a big deal.  I can't think of any
> reason to keep checking all these meta dependencies over and over either.
>
> I'm +1 on this.
> --
> Kevin A. McGrail
> VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
>
> On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3...@zerospam.ca> <[3...@zerospam.ca>
> wrote:
>
>
>     We use a lot of custom rules (2k+) and lint time was growing out of
>     control, over a minute on some machines.
>
>     It appears like $alreadydone, the set of rules that have already been
>     linted, is reset for every test. This makes lint time quadratic in the
>     number of tests and does not seem to have any use. Moving it out of the
>     tests loop reduces my lint time from 21 seconds to 3 seconds with no
>     visible side effect.
>
>     This patch moves it out of the tests loop
>
>     --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:54:19.244624440 -0500
>     +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:55:11.948708724 -0500
>     @@ -964,13 +964,13 @@
>        my ($self) = @_;
>        my $conf = $self->{conf};
>        $conf->{meta_dependencies} = { };
>     +  my $alreadydone = { };
>
>        foreach my $name (keys %{$conf->{tests}}) {
>          next unless ($conf->{test_types}->{$name}
>                          == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
>
>          my $deps = [ ];
>     -    my $alreadydone = { };
>          $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>          $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>        }
>
>
>     The result on ok rulesets and on obviously broken rulesets does not change
>     from pre-patch, and I can't see any negative side effects of moving it out
>     of the loop.
>
>     I'm not used to posting on dev lists, so please tell me if more information
>     is needed or if things should be presented differently.
>
>     -Olivier
>
>
> References:
>
> [1] https://www.linkedin.com/in/kmcgrail
> [3] mailto:olivier.coutu@zerospam.ca <ol...@zerospam.ca>
>
>

Re: Patch to reduce lint time

Posted by Olivier Coutu <ol...@zerospam.ca>.
You were quite right Henrick, I had insufficiently tested it and while 
it computed dependencies only once, it did not properly retrieve 
previously computed dependencies.

I updated the patch:

--- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 15:01:16.124532948 -0500
+++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 15:03:41.127692566 -0500
@@ -968,24 +968,29 @@
    foreach my $name (keys %{$conf->{tests}}) {
      next unless ($conf->{test_types}->{$name}
                      == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
-
-    my $deps = [ ];
-    my $alreadydone = { };
-    $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
-    $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
+    my $alreadydone = {};
+    $self->_meta_deps_recurse($conf, $name, $name, $alreadydone);
    }
  }
  
  sub _meta_deps_recurse {
-  my ($self, $conf, $toprule, $name, $deps, $alreadydone) = @_;
+  my ($self, $conf, $toprule, $name, $alreadydone) = @_;
  
-  # Only do each rule once per top-level meta; avoid infinite recursion
-  return if $alreadydone->{$name};
-  $alreadydone->{$name} = 1;
+  # Avoid recomputing the dependencies of a rule
+  return split(' ', $conf->{meta_dependencies}->{$name}) if defined $conf->{meta_dependencies}->{$name};
  
    # Obviously, don't trace empty or nonexistent rules
    my $rule = $conf->{tests}->{$name};
-  return unless $rule;
+  unless ($rule) {
+      $conf->{meta_dependencies}->{$name} = '';
+      return ( );
+  }
+
+  # Avoid infinite recursion
+  return ( ) if exists $alreadydone->{$name};
+  $alreadydone->{$name} = ( );
+
+  my %deps;
  
    # Lex the rule into tokens using a rather simple RE method ...
    my $lexer = ARITH_EXPRESSION_LEXER;
@@ -1001,9 +1006,12 @@
      next unless exists $conf_tests->{$token};
  
      # add and recurse
-    push(@{$deps}, untaint_var($token));
-    $self->_meta_deps_recurse($conf, $toprule, $token, $deps, $alreadydone);
+    $deps{untaint_var($token)} = ( );
+    my @subdeps = $self->_meta_deps_recurse($conf, $toprule, $token, $alreadydone);
+    @deps{@subdeps} = ( );
    }
+  $conf->{meta_dependencies}->{$name} = join (' ', keys %deps);
+  return keys %deps;
  }
  
  sub fix_priorities {


/_meta_deps_recurse/ now returns a hash that acts as a set of dependencies.

These dependencies are stored in /$conf->{meta_dependencies}->{$name}/ 
as they are calculated.

The result is not exactly the same as it was before, as I had some cases 
where duplicates in /$conf->{meta_dependencies}->{$name}/ would arise. 
This is no longer the case. The sets of dependencies stay identical.

I also considered changing /$conf->{meta_dependencies}->{$name}/ to a 
list instead of a string, but that was not necessary to achieve a 4-fold 
performance improvement and would require additional refactoring.

What do you think? I am not used to writing Perl code so feel free to 
correct the style.

On 19-01-08 11 h 24, Kevin A. McGrail wrote:
> Thanks Henrik!
>
> On 1/8/2019 11:21 AM, Henrik K wrote:
>> The code is saving dependencies of each meta rule in this variable:
>>    $conf->{meta_dependencies}
>>
>> That variable is used in Check.pm meta tests.
>>
>> Simply adding debug print after this line:
>>    $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>>    print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";
>>
>> And running both versions with spamassassin --lint 2>/tmp/log1 and log2
>>
>> Shows for example that original version:
>>
>> DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05
>>
>> New "faster" version:
>>
>> DEPS FOR __YOU_WON:
>>
>> So it appears __YOU_WON has no dependencies, thus the patch does not work as
>> intended, Check.pm fails to check for dependencies, -1 from me.
>>
>> Contributions are very much appreciated, but I urge everyone to use very
>> simple debug methods like this to verify even "trivial" things. :-)
>>
>> Cheers,
>> Henrik
>>
>> On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
>>> It's an interesting issue and a trivial patch.  I'm sure it was done as a
>>> safety valve figuring the cycles weren't a big deal.  I can't think of any
>>> reason to keep checking all these meta dependencies over and over either.
>>>
>>> I'm +1 on this.
>>> --
>>> Kevin A. McGrail
>>> VP Fundraising, Apache Software Foundation
>>> Chair Emeritus Apache SpamAssassin Project
>>> [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>
>>>
>>> On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3...@zerospam.ca>
>>> wrote:
>>>
>>>
>>>      We use a lot of custom rules (2k+) and lint time was growing out of
>>>      control, over a minute on some machines.
>>>
>>>      It appears like $alreadydone, the set of rules that have already been
>>>      linted, is reset for every test. This makes lint time quadratic in the
>>>      number of tests and does not seem to have any use. Moving it out of the
>>>      tests loop reduces my lint time from 21 seconds to 3 seconds with no
>>>      visible side effect.
>>>
>>>      This patch moves it out of the tests loop
>>>
>>>      --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:54:19.244624440 -0500
>>>      +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:55:11.948708724 -0500
>>>      @@ -964,13 +964,13 @@
>>>         my ($self) = @_;
>>>         my $conf = $self->{conf};
>>>         $conf->{meta_dependencies} = { };
>>>      +  my $alreadydone = { };
>>>
>>>         foreach my $name (keys %{$conf->{tests}}) {
>>>           next unless ($conf->{test_types}->{$name}
>>>                           == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
>>>
>>>           my $deps = [ ];
>>>      -    my $alreadydone = { };
>>>           $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>>>           $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>>>         }
>>>
>>>
>>>      The result on ok rulesets and on obviously broken rulesets does not change
>>>      from pre-patch, and I can't see any negative side effects of moving it out
>>>      of the loop.
>>>
>>>      I'm not used to posting on dev lists, so please tell me if more information
>>>      is needed or if things should be presented differently.
>>>
>>>      -Olivier
>>>
>>>
>>> References:
>>>
>>> [1] https://www.linkedin.com/in/kmcgrail
>>> [3] mailto:olivier.coutu@zerospam.ca
>

Re: Patch to reduce lint time

Posted by "Kevin A. McGrail" <km...@apache.org>.
Thanks Henrik!

On 1/8/2019 11:21 AM, Henrik K wrote:
> The code is saving dependencies of each meta rule in this variable:
>   $conf->{meta_dependencies}
>
> That variable is used in Check.pm meta tests.
>
> Simply adding debug print after this line:
>   $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>   print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";
>
> And running both versions with spamassassin --lint 2>/tmp/log1 and log2
>
> Shows for example that original version:
>
> DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05
>
> New "faster" version:
>
> DEPS FOR __YOU_WON: 
>
> So it appears __YOU_WON has no dependencies, thus the patch does not work as
> intended, Check.pm fails to check for dependencies, -1 from me.
>
> Contributions are very much appreciated, but I urge everyone to use very
> simple debug methods like this to verify even "trivial" things. :-)
>
> Cheers,
> Henrik
>
> On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
>> It's an interesting issue and a trivial patch.  I'm sure it was done as a
>> safety valve figuring the cycles weren't a big deal.  I can't think of any
>> reason to keep checking all these meta dependencies over and over either.
>>
>> I'm +1 on this.
>> --
>> Kevin A. McGrail
>> VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>
>>
>> On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3...@zerospam.ca>
>> wrote:
>>
>>
>>     We use a lot of custom rules (2k+) and lint time was growing out of
>>     control, over a minute on some machines.
>>
>>     It appears like $alreadydone, the set of rules that have already been
>>     linted, is reset for every test. This makes lint time quadratic in the
>>     number of tests and does not seem to have any use. Moving it out of the
>>     tests loop reduces my lint time from 21 seconds to 3 seconds with no
>>     visible side effect.
>>
>>     This patch moves it out of the tests loop
>>
>>     --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:54:19.244624440 -0500
>>     +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:55:11.948708724 -0500
>>     @@ -964,13 +964,13 @@
>>        my ($self) = @_;
>>        my $conf = $self->{conf};
>>        $conf->{meta_dependencies} = { };
>>     +  my $alreadydone = { };
>>
>>        foreach my $name (keys %{$conf->{tests}}) {
>>          next unless ($conf->{test_types}->{$name}
>>                          == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
>>
>>          my $deps = [ ];
>>     -    my $alreadydone = { };
>>          $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>>          $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>>        }
>>
>>
>>     The result on ok rulesets and on obviously broken rulesets does not change
>>     from pre-patch, and I can't see any negative side effects of moving it out
>>     of the loop.
>>
>>     I'm not used to posting on dev lists, so please tell me if more information
>>     is needed or if things should be presented differently.
>>
>>     -Olivier
>>
>>
>> References:
>>
>> [1] https://www.linkedin.com/in/kmcgrail
>> [3] mailto:olivier.coutu@zerospam.ca


-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Re: Patch to reduce lint time

Posted by Henrik K <he...@hege.li>.
The code is saving dependencies of each meta rule in this variable:
  $conf->{meta_dependencies}

That variable is used in Check.pm meta tests.

Simply adding debug print after this line:
  $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
  print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";

And running both versions with spamassassin --lint 2>/tmp/log1 and log2

Shows for example that original version:

DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05

New "faster" version:

DEPS FOR __YOU_WON: 

So it appears __YOU_WON has no dependencies, thus the patch does not work as
intended, Check.pm fails to check for dependencies, -1 from me.

Contributions are very much appreciated, but I urge everyone to use very
simple debug methods like this to verify even "trivial" things. :-)

Cheers,
Henrik

On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
> It's an interesting issue and a trivial patch.  I'm sure it was done as a
> safety valve figuring the cycles weren't a big deal.  I can't think of any
> reason to keep checking all these meta dependencies over and over either.
> 
> I'm +1 on this.
> --
> Kevin A. McGrail
> VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171
> 
> 
> On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3...@zerospam.ca>
> wrote:
> 
> 
>     We use a lot of custom rules (2k+) and lint time was growing out of
>     control, over a minute on some machines.
> 
>     It appears like $alreadydone, the set of rules that have already been
>     linted, is reset for every test. This makes lint time quadratic in the
>     number of tests and does not seem to have any use. Moving it out of the
>     tests loop reduces my lint time from 21 seconds to 3 seconds with no
>     visible side effect.
> 
>     This patch moves it out of the tests loop
> 
>     --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:54:19.244624440 -0500
>     +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 17:55:11.948708724 -0500
>     @@ -964,13 +964,13 @@
>        my ($self) = @_;
>        my $conf = $self->{conf};
>        $conf->{meta_dependencies} = { };
>     +  my $alreadydone = { };
> 
>        foreach my $name (keys %{$conf->{tests}}) {
>          next unless ($conf->{test_types}->{$name}
>                          == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
> 
>          my $deps = [ ];
>     -    my $alreadydone = { };
>          $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>          $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>        }
> 
> 
>     The result on ok rulesets and on obviously broken rulesets does not change
>     from pre-patch, and I can't see any negative side effects of moving it out
>     of the loop.
> 
>     I'm not used to posting on dev lists, so please tell me if more information
>     is needed or if things should be presented differently.
> 
>     -Olivier
> 
> 
> References:
> 
> [1] https://www.linkedin.com/in/kmcgrail
> [3] mailto:olivier.coutu@zerospam.ca

Re: Patch to reduce lint time

Posted by "Kevin A. McGrail" <km...@apache.org>.
It's an interesting issue and a trivial patch.  I'm sure it was done as a
safety valve figuring the cycles weren't a big deal.  I can't think of any
reason to keep checking all these meta dependencies over and over either.

I'm +1 on this.
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <ol...@zerospam.ca>
wrote:

> We use a lot of custom rules (2k+) and lint time was growing out of
> control, over a minute on some machines.
>
> It appears like *$alreadydone*, the set of rules that have already been
> linted, is reset for every test. This makes lint time quadratic in the
> number of tests and does not seem to have any use. Moving it out of the
> tests loop reduces my lint time from 21 seconds to 3 seconds with no
> visible side effect.
>
> This patch moves it out of the tests loop
>
> --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm	2019-01-07 17:54:19.244624440 -0500
> +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm	2019-01-07 17:55:11.948708724 -0500
> @@ -964,13 +964,13 @@
>    my ($self) = @_;
>    my $conf = $self->{conf};
>    $conf->{meta_dependencies} = { };
> +  my $alreadydone = { };
>
>    foreach my $name (keys %{$conf->{tests}}) {
>      next unless ($conf->{test_types}->{$name}
>                      == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
>
>      my $deps = [ ];
> -    my $alreadydone = { };
>      $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>      $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>    }
>
>
> The result on ok rulesets and on obviously broken rulesets does not change
> from pre-patch, and I can't see any negative side effects of moving it out
> of the loop.
>
> I'm not used to posting on dev lists, so please tell me if more
> information is needed or if things should be presented differently.
> -Olivier
>