You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2016/11/24 04:59:52 UTC

[Bug 7374] New: Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

            Bug ID: 7374
           Summary: Some e-mails create "Complex regular subexpression
                    recursion limit (32766) exceeded" warning
           Product: Spamassassin
           Version: 3.4.1
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: spamassassin
          Assignee: dev@spamassassin.apache.org
          Reporter: richw@richw.org
  Target Milestone: Undefined

Created attachment 5421
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5421&action=edit
Sample spam e-mail (gzipped) which illustrates the problem

I regularly use sa-learn on my inbox and my spam folder.  Some e-mails generate
the following message as they are scanned:

Complex regular subexpression recursion limit (32766) exceeded at
/usr/share/perl5/Mail/SpamAssassin/HTML.pm line 745.

I am attaching one sample e-mail illustrating this problem.  The same warning
is generated if I run this message through spamassassin instead of sa-learn.

John Hardin <jh...@impsec.org> confirmed he could reproduce this warning on
trunk, and he said the problem was in the handling of a long block of
quoted-printable blanks (=20) at the end of the message.

I'm using Spamassassin 3.4.1, running on Perl 5.22.1, on an Ubuntu 16.04 LTS
server.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@apache.org

--- Comment #4 from Kevin A. McGrail <km...@apache.org> ---
+1 from me

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

Joe Quinn <jq...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jquinn+SAbug@pccc.com

--- Comment #2 from Joe Quinn <jq...@pccc.com> ---
It looks like the regex in question is this:

  if ($text !~ /^(?:[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
    $invisible_for_bayes = $self->html_font_invisible($text);
  }

It looks for a line that contains something other than a certain set of
whitespace characters.

From perldiag:
    Complex regular subexpression recursion limit (%d) exceeded

    (W regexp) The regular expression engine uses recursion in complex
situations where back-tracking is required. Recursion depth is limited to
32766, or perhaps less in architectures where the stack cannot grow
arbitrarily. ("Simple" and "medium" situations are handled without recursion
and are not subject to a limit.) Try shortening the string under examination;
looping in Perl code (e.g. with while ) rather than in the regular expression
engine; or rewriting the regular expression so that it is simpler or backtracks
less. (See perlfaq2 for information on Mastering Regular Expressions.)

This regex must be sufficiently not-simple that it gets solved with recursion,
and with enough stuff to crunch through it hits that limit of 32k. I think this
has to do with the quantifier having a nested choice between single-byte
whitespace and two-byte NBSP making backtracking more complicated. I believe we
can eliminate backtracking entirely here because this regex will never succeed
on a less than totally greedy match.

  if ($text !~ /^(?>[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
    $invisible_for_bayes = $self->html_font_invisible($text);
  }

see also:
http://stackoverflow.com/questions/26226630/latest-perl-wont-match-certain-regexes-more-than-32768-characters-long
http://perldoc.perl.org/perlre.html#(%3f%3epattern)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

Rich Wales <ri...@richw.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |richw@richw.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

--- Comment #1 from Rich Wales <ri...@richw.org> ---
Created attachment 5422
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5422&action=edit
Output (gzipped) of "spamassassin --lint -D" on my mail server

Responding to a list request, I'm attaching the output of "spamassassin --lint
-D".

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hege@hege.li

--- Comment #3 from Henrik Krohns <he...@hege.li> ---
Using (?> does not work, as discussed in mentioned links, it also encounters
the internal limit.

Implemented it by removing stuff from copy of the string and check if there's
anything left.

Verified running masschecks for few thousand messages, no differences in logs.

Feel free to vote if should commit 3.4, I think it's trivial so +1

Sending        trunk/lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .done
Committing transaction...
Committed revision 1861257.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7374] Some e-mails create "Complex regular subexpression recursion limit (32766) exceeded" warning

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Henrik Krohns <he...@hege.li> ---
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .done
Committing transaction...
Committed revision 1861265.

-- 
You are receiving this mail because:
You are the assignee for the bug.