You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2012/06/04 19:26:37 UTC
[Bug 6802] New: /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802
Priority: P2
Bug ID: 6802
Assignee: dev@spamassassin.apache.org
Summary: /ss/i interpreted as /(?:ss|ß)/, Variable length
lookbehind not implemented in regex
Severity: normal
Classification: Unclassified
OS: All
Reporter: Mark.Martinec@ijs.si
Hardware: All
Status: NEW
Version: 3.4 SVN branch
Component: Rules
Product: Spamassassin
This is what I reported on a ML on 2012-05-28 - but see below,
the problem is likely to be more involved:
There are three cases of:
"Variable length lookbehind not implemented in regex"
perl error reported by perl 5.16.0. in J_CHICKENPOX_* rules,
file rulesrc/sandbox/khopesh/20_chickenpox.cf.
Perldiag man page explains:
Variable length lookbehind not implemented in m/%s/
(F) Lookbehind is allowed only for subexpressions whose length
is fixed and known at compile time. See perlre.
perlre main page:
"(?<=pattern)" "\K"
A zero-width positive look-behind assertion. For example,
"/(?<=\t)\w+/" matches a word that follows a tab, without
including the tab in $&. Works only for fixed-width look-
behind.
"(?<!pattern)"
A zero-width negative look-behind assertion. For example
"/(?<!bar)foo/" matches any occurrence of "foo" that does not
follow "bar". Works only for fixed-width look-behind.
lint: config: invalid regexp for rule J_CHICKENPOX_13: m/\s(?![acdgjlmnosx]
[`'"])[a-zA-Z]{1}[.,\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.
(?:(?-i:[A-Z][a-z]{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|
om|pp|ss)|d(?:at|ll|mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|
ni)|jpg|l(?:ib|og)|m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|
raw|s(?:cr|ql|ty|ys)|t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|
[`'"]all)(?:[,'\?!]|\.?\s)/i: Variable length lookbehind not implemented in
regex m/(?i)\s(?![acdgjlmnosx][`'"])[a-zA-Z]{1}[.,\;:?%!&+^~`'\$*=#|
013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z].../
lint: config: invalid regexp for rule J_CHICKENPOX_23: m/\s(?!(?:fn|re):|
(?:cc|to)=|(?:qu|un)[`'"]|(?:dr|m[rst]|li|st|td)\.)[a-zA-Z]{2}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z]
{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|om|pp|ss)|d(?:at|ll|
mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|ni)|jpg|l(?:ib|og)|
m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|raw|s(?:cr|ql|ty|ys)|
t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|['`"]tje)(?:[,'\?!]|
\.?\s)/i: Variable length lookbehind not implemented in regex m/(?i)\s(?!
(?:fn|re):|(?:cc|to)=|(?:qu|un)[`'"]|(?:dr|m[rst]|li|st|td)\.)[a-zA-Z]{2}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\].../
lint: config: invalid regexp for rule J_CHICKENPOX_33: m/\s(?!(?:alt|biz|mrs|
rev|s(?:ci|en|oc))\.|(?:end|fwd|org|reg):|pop3|cos')[a-zA-Z]{3}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z]
{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|om|pp|ss)|d(?:at|ll|
mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|ni)|jpg|l(?:ib|og)|
m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|raw|s(?:cr|ql|ty|ys)|
t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|['`"]tje)(?:[,'\?!]|
\.?\s)/i: Variable length lookbehind not implemented in regex m/(?i)\s(?!
(?:alt|biz|mrs|rev|s(?:ci|en|oc))\.|(?:end|fwd|org|reg):|pop3|cos')[a-zA-Z]{3}
[.,\;:?%!&+^~`'\$*=#|013467\(\.../
ERROR: LINT FAILED, suppressing output: rules/70_sandbox.cf
It seemed like a perl bug, but it turns out it may not be.
I reported the case as:
https://rt.perl.org:443/rt3/Ticket/Display.html?id=113496
Came across a small handful of complex regular expressions
(in SpamAssassin rules), which seem to unwarrantedly hit an
error: "Variable length lookbehind not implemented in regex"
under perl 5.16.0 (on FreeBSD).
Here is a distilled-down sample case:
good:
$ perl -e 'm/(?<!abc|cde)/i'
good:
$ perl -e 'm/(?<!abc|css)/'
incorrect diagnostics:
$ perl -e 'm/(?<!abc|css)/i'
Variable length lookbehind not implemented in regex m/(?<!abc|css)/
at -e line 1.
doy replied:
This is likely because in 5.16, /ss/i compiles to something along the
lines of /(?:ss|ß)/. You could possibly use \K instead of lookbehind
(depending on the pattern), or use the /a regex modifier to enforce
ASCII semantics.
Father Chrysostomos adds:
The /aa modifier is what you need here.
So, are just the three J_CHICKENPOX_* rules in need of fixing,
or are we in deeper trouble?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802
--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
> Does it only rear it's head with compiled rules?
It showed in a --lint phase of install, non-compiled rules.
(compiled rules are possibly affected too, but these may
already be broken due to changes in a perl debug output
across versions, Bug 6649)
As a quick and dirty hack I just replaced a 'ss' with 's[s]'
in these three rules, so that installation/lint does not barf:
Bug 6802: a hack on three J_CHICKENPOX_* rules,
replacing ss with s[s] avoids interpreting "ss" as "sharp s"
Sending rulesrc/sandbox/khopesh/20_chickenpox.cf
Committed revision 1346064.
Don't know where else we may encounter effects of these changes
in perl. These three rules were just 'lucky' in using a construct
which involves an additional internal check on string lengths.
So far so good with 5.16.0, things appear to be working normally.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802
--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
> Don't know where else we may encounter effects of these changes
Should we be adding an:
use re "/aa";
in code sections which interpret regexps in rules
to avoid surprises with Unicode semantics, or deal with
specific problems as we come across?
Adding an aa modifier directly in rules would break
these regexps for versions of perl older than 5.12 (I think).
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802
Kevin A. McGrail <km...@pccc.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kmcgrail@pccc.com
--- Comment #1 from Kevin A. McGrail <km...@pccc.com> ---
I had been wondering when that email you wrote about chickenpox was going to
rear it's head. Unfortunately, this issue is WAY over my regexp head. Does it
only rear it's head with compiled rules?
--
You are receiving this mail because:
You are the assignee for the bug.