You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2012/06/04 19:26:37 UTC

[Bug 6802] New: /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802

          Priority: P2
            Bug ID: 6802
          Assignee: dev@spamassassin.apache.org
           Summary: /ss/i interpreted as /(?:ss|ß)/, Variable length
                    lookbehind not implemented in regex
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: Mark.Martinec@ijs.si
          Hardware: All
            Status: NEW
           Version: 3.4 SVN branch
         Component: Rules
           Product: Spamassassin

This is what I reported on a ML on 2012-05-28 - but see below,
the problem is likely to be more involved:


There are three cases of:

  "Variable length lookbehind not implemented in regex"

perl error reported by perl 5.16.0. in J_CHICKENPOX_* rules,
file rulesrc/sandbox/khopesh/20_chickenpox.cf.

Perldiag man page explains:

  Variable length lookbehind not implemented in m/%s/
    (F) Lookbehind is allowed only for subexpressions whose length
        is fixed and known at compile time.  See perlre.

perlre main page:

  "(?<=pattern)" "\K"
    A zero-width positive look-behind assertion.  For example,
    "/(?<=\t)\w+/" matches a word that follows a tab, without
    including the tab in $&.  Works only for fixed-width look-
    behind.

  "(?<!pattern)"
    A zero-width negative look-behind assertion.  For example
    "/(?<!bar)foo/" matches any occurrence of "foo" that does not
    follow "bar".  Works only for fixed-width look-behind.


lint: config: invalid regexp for rule J_CHICKENPOX_13: m/\s(?![acdgjlmnosx]
[`'"])[a-zA-Z]{1}[.,\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.
(?:(?-i:[A-Z][a-z]{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|
om|pp|ss)|d(?:at|ll|mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|
ni)|jpg|l(?:ib|og)|m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|
raw|s(?:cr|ql|ty|ys)|t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|
[`'"]all)(?:[,'\?!]|\.?\s)/i: Variable length lookbehind not implemented in 
regex m/(?i)\s(?![acdgjlmnosx][`'"])[a-zA-Z]{1}[.,\;:?%!&+^~`'\$*=#|
013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z].../

lint: config: invalid regexp for rule J_CHICKENPOX_23: m/\s(?!(?:fn|re):|
(?:cc|to)=|(?:qu|un)[`'"]|(?:dr|m[rst]|li|st|td)\.)[a-zA-Z]{2}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z]
{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|om|pp|ss)|d(?:at|ll|
mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|ni)|jpg|l(?:ib|og)|
m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|raw|s(?:cr|ql|ty|ys)|
t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|['`"]tje)(?:[,'\?!]|
\.?\s)/i: Variable length lookbehind not implemented in regex m/(?i)\s(?!
(?:fn|re):|(?:cc|to)=|(?:qu|un)[`'"]|(?:dr|m[rst]|li|st|td)\.)[a-zA-Z]{2}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\].../

lint: config: invalid regexp for rule J_CHICKENPOX_33: m/\s(?!(?:alt|biz|mrs|
rev|s(?:ci|en|oc))\.|(?:end|fwd|org|reg):|pop3|cos')[a-zA-Z]{3}[.,
\;:?%!&+^~`'\$*=#|013467\(\)\[\]\{\}<>"][a-zA-Z]{3}(?<!\.(?:(?-i:[A-Z][a-z]
{2})|a(?:sc|sp|ux)|b(?:ak|at|iz|in|ks|mk|mp)|c(?:fg|gi|nf|om|pp|ss)|d(?:at|ll|
mg|oc)|e(?:du|nt|xe|xt)|g(?:if|ov)|htm|i(?:co|di|mg|nc|nf|ni)|jpg|l(?:ib|og)|
m(?:ap|bs|il|p[eg])|net|org|p(?:df|fb|hp|i[df]|ng|pt|sd)|raw|s(?:cr|ql|ty|ys)|
t(?:ar|ex|ld|mp|tf|xt)|usr|w(?:av|pi)|x(?:ls|ml|sl)|zip)|['`"]tje)(?:[,'\?!]|
\.?\s)/i: Variable length lookbehind not implemented in regex m/(?i)\s(?!
(?:alt|biz|mrs|rev|s(?:ci|en|oc))\.|(?:end|fwd|org|reg):|pop3|cos')[a-zA-Z]{3}
[.,\;:?%!&+^~`'\$*=#|013467\(\.../

ERROR: LINT FAILED, suppressing output: rules/70_sandbox.cf




It seemed like a perl bug, but it turns out it may not be.
I reported the case as:
  https://rt.perl.org:443/rt3/Ticket/Display.html?id=113496

  Came across a small handful of complex regular expressions
  (in SpamAssassin rules), which seem to unwarrantedly hit an
  error: "Variable length lookbehind not implemented in regex"
  under perl 5.16.0 (on FreeBSD).

  Here is a distilled-down sample case:

  good:
    $ perl -e 'm/(?<!abc|cde)/i'

  good:
    $ perl -e 'm/(?<!abc|css)/'

  incorrect diagnostics:
    $ perl -e 'm/(?<!abc|css)/i'
    Variable length lookbehind not implemented in regex m/(?<!abc|css)/
      at -e line 1.


doy replied:
  This is likely because in 5.16, /ss/i compiles to something along the
  lines of /(?:ss|ß)/. You could possibly use \K instead of lookbehind
  (depending on the pattern), or use the /a regex modifier to enforce
  ASCII semantics.

Father Chrysostomos adds:
  The /aa modifier is what you need here.



So, are just the three J_CHICKENPOX_* rules in need of fixing,
or are we in deeper trouble?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
> Does it only rear it's head with compiled rules?

It showed in a --lint phase of install, non-compiled rules.

(compiled rules are possibly affected too, but these may
already be broken due to changes in a perl debug output
across versions, Bug 6649)

As a quick and dirty hack I just replaced a 'ss' with 's[s]'
in these three rules, so that installation/lint does not barf:

  Bug 6802: a hack on three J_CHICKENPOX_* rules,
  replacing ss with s[s] avoids interpreting "ss" as "sharp s"
    Sending rulesrc/sandbox/khopesh/20_chickenpox.cf
  Committed revision 1346064.


Don't know where else we may encounter effects of these changes
in perl. These three rules were just 'lucky' in using a construct
which involves an additional internal check on string lengths.

So far so good with 5.16.0, things appear to be working normally.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
> Don't know where else we may encounter effects of these changes

Should we be adding an:
  use re "/aa";
in code sections which interpret regexps in rules
to avoid surprises with Unicode semantics, or deal with
specific problems as we come across?

Adding an aa modifier directly in rules would break
these regexps for versions of perl older than 5.12 (I think).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6802] /ss/i interpreted as /(?:ss|ß)/, Variable length lookbehind not implemented in regex

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6802

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #1 from Kevin A. McGrail <km...@pccc.com> ---
I had been wondering when that email you wrote about chickenpox was going to
rear it's head.  Unfortunately, this issue is WAY over my regexp head.  Does it
only rear it's head with compiled rules?

-- 
You are receiving this mail because:
You are the assignee for the bug.