You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/06/23 18:13:54 UTC

[Bug 5931] New: add_cidr chokes on large amount of trusted_networks

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931

           Summary: add_cidr chokes on large amount of trusted_networks
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P4
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: hege@hege.li


I'm currently converting dnswl-data to trusted_networks entries (~10000), which
works fine and provides benefits from less queries and ALL_TRUSTED
shortcircuits.

It was necessary to remove this from add_cidr():

#next if ($self->is_net_declared($ip4, $ip6, $exclude, 0));

The continuous looping iterates into millions of checks, eventually halting to
a stop.

I don't really care if there are duplicates, since it doesn't affect anything.
Probably only if you use exclusions. In that case the code should use two
separate arrays for includes and excludes, instead of one unsorted like now.

Would you maybe accept patch that added option to disable that duplicate check?
Or do you have any ideas how to speed things up?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[review] add_cidr chokes on |add_cidr chokes on large
                   |large amount of             |amount of trusted_networks
                   |trusted_networks            |
  Status Whiteboard|needs 1 votes               |




--- Comment #7 from Mark Martinec <Ma...@ijs.si>  2008-10-17 18:45:17 PST ---
> I would much prefer that the existing
> check be performed until the amount of ranges inputted makes it impractical
> to do so (say when 200 ranges are added it skips the O(N^2) bit).

That's a sensible compromise.

Btw, I wonder if cases of several thousand entries are more common
than anticipated. If they are, it might make sense to provide an
alternative mechanism, much faster but less flexible, like a
hash table of classful network addresses. For an IPv4 address,
four hash lookups would suffice for a check, regardless of the
size of a hash table (practically a constant access time).
Both methods would not combine well (in presence of overlapping
ranges), but chosing one or another would be a good option.
(btw, these two lookup types are offered by amavisd-new).


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931





--- Comment #5 from Henrik Krohns <he...@hege.li>  2008-10-16 23:02:15 PST ---
(In reply to comment #4)
> I would much prefer that the existing
> check be performed until the amount of ranges inputted makes it impractical to
> do so (say when 200 ranges are added it skips the O(N^2) bit).

+1 from me even if it doesn't count. ;)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931





--- Comment #9 from Justin Mason <jm...@jmason.org>  2008-12-29 06:56:24 PST ---
(In reply to comment #7)
> > I would much prefer that the existing
> > check be performed until the amount of ranges inputted makes it impractical
> > to do so (say when 200 ranges are added it skips the O(N^2) bit).
> 
> That's a sensible compromise.

I've now implemented that on trunk:

: 260...; svn commit -m "bug 5931: trusted_networks bogs down due to O(n^2)
loop with millions of entries; revisit -- we still want to keep the check if
possible, so just skip the O(n^2) linting code if we have over 200 networks in
the list" lib/Mail/SpamAssassin/NetSet.pm
Sending        lib/Mail/SpamAssassin/NetSet.pm
Transmitting file data .
Committed revision 729906 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=729906 ).


> Btw, I wonder if cases of several thousand entries are more common
> than anticipated. If they are, it might make sense to provide an
> alternative mechanism, much faster but less flexible, like a
> hash table of classful network addresses. For an IPv4 address,
> four hash lookups would suffice for a check, regardless of the
> size of a hash table (practically a constant access time).
> Both methods would not combine well (in presence of overlapping
> ranges), but chosing one or another would be a good option.
> (btw, these two lookup types are offered by amavisd-new).

I'd be happy to see that as a plugin ;)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Daryl C. W. O'Shea <sp...@dostech.ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |spamassassin@dostech.ca




--- Comment #4 from Daryl C. W. O'Shea <sp...@dostech.ca>  2008-10-16 15:56:00 PST ---
(In reply to comment #1)
> agreed, this is a waste of time.  fixed in trunk:
> 
> : jm 237...; svn commit -m "bug 5931: remove needless O(N^2) de-duplication
> step from trusted_networks" lib/Mail/SpamAssassin/NetSet.pm

FWIW, there are two reasons why I wrote it like this (fully aware of the O(N^2)
step, but not expecting someone to load 10,000 ranges):

- The major one: It provides debugging for whether or not an excluded net is
really going to be excluded... without this an apparently valid config will not
produce the expected results (ie, "trusted_networks 0/0 !131.107/16" will lint
fine *and* trust mail from Microsoft").  Knowing how confused people are with
the whole Apache ACL mess I wanted to make this config option perfectly clear
as to whether or not you were getting what you expected.

- A minor one: I'd rather save time per message (albeit probably just a little)
than on start-up.  Again I never expected 10,000 ranges to be used.

I guess I didn't do a very good job at the regression tests for this... it
still passes... it only warns you that it's expecting an error (that now never
happens), it doesn't fail when it doesn't see the error.

Also... the same 0(N^2) slowdown exists as soon as you start loading both a
trusted_networks and internal_networks config (internal networks are checked to
make sure they're in trusted_networks).  This is another lint thing that I feel
is important since so many people are confused about the whole trusted/internal
networks thing.  internal_networks must be in trusted_networks.

Anyway, as such, I'm -0.9 on the change.  I would much prefer that the existing
check be performed until the amount of ranges inputted makes it impractical to
do so (say when 200 ranges are added it skips the O(N^2) bit).  Note that I am
not for running this only with --lint enabled... it's useful when a message is
run through with -D.  I can't emphasize how important I think the lint checks
are for this part of the config.

(In reply to comment #3)
> Henrik Krohns wrote:
> Just in case this pops up some time in the future: having separate tables
> for includes and excludes is a really bad idea. We do not want to repeat
> an apache http access-list mistake (a need for Order directive, inability to
> express subranges within subranges)

This is specifically the reason I wrote it like this.  The Apache ACL is a mess
that, although really is straight-forward (just not very flexible), is
confusing as hell to your typical user.

> The patch is fine, an occasional duplicate does no harm.

Obviously I disagree. :)  I believe that the resulting degradation of the lint
capability is detrimental.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #4347 is|0                           |1
           obsolete|                            |
         AssignedTo|dev@spamassassin.apache.org |jm@jmason.org




--- Comment #10 from Justin Mason <jm...@jmason.org>  2008-12-29 06:59:02 PST ---
Created an attachment (id=4413)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4413)
revised fix for 3.2.x, with 200-net limit

and as a patch for 3.2.x.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 2 votes               |needs 1 votes




--- Comment #3 from Mark Martinec <Ma...@ijs.si>  2008-10-16 08:13:49 PST ---
Henrik Krohns wrote:
> I don't really care if there are duplicates, since it doesn't affect anything.
> Probably only if you use exclusions. In that case the code should use two
> separate arrays for includes and excludes, instead of one unsorted like now.

Just in case this pops up some time in the future: having separate tables
for includes and excludes is a really bad idea. We do not want to repeat
an apache http access-list mistake (a need for Order directive, inability to
express subranges within subranges), or a mistake from early implementation
of tcp_wrappers having two files, later fixed by adding hosts_options).
The lookup list must be only one, followed in a user-specified order, and
each entry must hold a boolean property (e.g. pass/block, allow/deny).

The patch is fine, an occasional duplicate does no harm.
+1


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931





--- Comment #6 from Daryl C. W. O'Shea <sp...@dostech.ca>  2008-10-17 15:38:07 PST ---
(In reply to comment #5)
> +1 from me even if it doesn't count. ;)

User input carries a very non-zero amount of weight. :)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931





--- Comment #8 from Henrik Krohns <he...@hege.li>  2008-10-18 01:32:46 PST ---

I don't think anyone except dnswl users like me would need that many entries.
Unless SA adopts such measure (which would be cool), I don't think you need to
think about other methods.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931





--- Comment #2 from Justin Mason <jm...@jmason.org>  2008-07-11 01:56:51 PST ---
Created an attachment (id=4347)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4347)
fix for 3.2.x


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] [review] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|add_cidr chokes on large    |[review] add_cidr chokes on
                   |amount of trusted_networks  |large amount of
                   |                            |trusted_networks
  Status Whiteboard|                            |needs 2 votes
   Target Milestone|3.3.0                       |3.2.6




-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5931] add_cidr chokes on large amount of trusted_networks

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5931


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
   Target Milestone|Undefined                   |3.3.0




--- Comment #1 from Justin Mason <jm...@jmason.org>  2008-07-11 01:55:57 PST ---
agreed, this is a waste of time.  fixed in trunk:

: jm 237...; svn commit -m "bug 5931: remove needless O(N^2) de-duplication
step from trusted_networks" lib/Mail/SpamAssassin/NetSet.pm
Sending        lib/Mail/SpamAssassin/NetSet.pm
Transmitting file data .
Committed revision 675869.

patch coming for 3.2.x (in case we do another one).


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.