You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2016/11/24 04:59:52 UTC
[Bug 7374] New: Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Bug ID: 7374
Summary: Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Product: Spamassassin
Version: 3.4.1
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: spamassassin
Assignee: dev@spamassassin.apache.org
Reporter: richw@richw.org
Target Milestone: Undefined
Created attachment 5421
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5421&action=edit
Sample spam e-mail (gzipped) which illustrates the problem
I regularly use sa-learn on my inbox and my spam folder. Some e-mails generate
the following message as they are scanned:
Complex regular subexpression recursion limit (32766) exceeded at
/usr/share/perl5/Mail/SpamAssassin/HTML.pm line 745.
I am attaching one sample e-mail illustrating this problem. The same warning
is generated if I run this message through spamassassin instead of sa-learn.
John Hardin <jh...@impsec.org> confirmed he could reproduce this warning on
trunk, and he said the problem was in the handling of a long block of
quoted-printable blanks (=20) at the end of the message.
I'm using Spamassassin 3.4.1, running on Perl 5.22.1, on an Ubuntu 16.04 LTS
server.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Kevin A. McGrail <km...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kmcgrail@apache.org
--- Comment #4 from Kevin A. McGrail <km...@apache.org> ---
+1 from me
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Joe Quinn <jq...@pccc.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jquinn+SAbug@pccc.com
--- Comment #2 from Joe Quinn <jq...@pccc.com> ---
It looks like the regex in question is this:
if ($text !~ /^(?:[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
$invisible_for_bayes = $self->html_font_invisible($text);
}
It looks for a line that contains something other than a certain set of
whitespace characters.
From perldiag:
Complex regular subexpression recursion limit (%d) exceeded
(W regexp) The regular expression engine uses recursion in complex
situations where back-tracking is required. Recursion depth is limited to
32766, or perhaps less in architectures where the stack cannot grow
arbitrarily. ("Simple" and "medium" situations are handled without recursion
and are not subject to a limit.) Try shortening the string under examination;
looping in Perl code (e.g. with while ) rather than in the regular expression
engine; or rewriting the regular expression so that it is simpler or backtracks
less. (See perlfaq2 for information on Mastering Regular Expressions.)
This regex must be sufficiently not-simple that it gets solved with recursion,
and with enough stuff to crunch through it hits that limit of 32k. I think this
has to do with the quantifier having a nested choice between single-byte
whitespace and two-byte NBSP making backtracking more complicated. I believe we
can eliminate backtracking entirely here because this regex will never succeed
on a less than totally greedy match.
if ($text !~ /^(?>[ \t\n\r\f\x0b]|\xc2\xa0)*\z/s) {
$invisible_for_bayes = $self->html_font_invisible($text);
}
see also:
http://stackoverflow.com/questions/26226630/latest-perl-wont-match-certain-regexes-more-than-32768-characters-long
http://perldoc.perl.org/perlre.html#(%3f%3epattern)
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Rich Wales <ri...@richw.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |richw@richw.org
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
--- Comment #1 from Rich Wales <ri...@richw.org> ---
Created attachment 5422
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5422&action=edit
Output (gzipped) of "spamassassin --lint -D" on my mail server
Responding to a list request, I'm attaching the output of "spamassassin --lint
-D".
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Henrik Krohns <he...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hege@hege.li
--- Comment #3 from Henrik Krohns <he...@hege.li> ---
Using (?> does not work, as discussed in mentioned links, it also encounters
the internal limit.
Implemented it by removing stuff from copy of the string and check if there's
anything left.
Verified running masschecks for few thousand messages, no differences in logs.
Feel free to vote if should commit 3.4, I think it's trivial so +1
Sending trunk/lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .done
Committing transaction...
Committed revision 1861257.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7374] Some e-mails create "Complex regular subexpression
recursion limit (32766) exceeded" warning
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7374
Henrik Krohns <he...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #5 from Henrik Krohns <he...@hege.li> ---
Sending spamassassin-3.4/lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .done
Committing transaction...
Committed revision 1861265.
--
You are receiving this mail because:
You are the assignee for the bug.