You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mark Harwood (Jira)" <ji...@apache.org> on 2020/05/15 10:57:00 UTC

[jira] [Resolved] (LUCENE-9370) RegExpQuery should error for inappropriate use of \ character in input

     [ https://issues.apache.org/jira/browse/LUCENE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood resolved LUCENE-9370.
----------------------------------
    Fix Version/s: master (9.0)
       Resolution: Fixed

Fixed in https://github.com/apache/lucene-solr/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce

> RegExpQuery should error for inappropriate use of \ character in input
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-9370
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9370
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: master (9.0)
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: master (9.0)
>
>
> The RegExp class is too lenient in parsing user input which can confuse or mislead users and cause backwards compatibility issues as we enhance regex support.
> In normal regular expression syntax the backslash is used to:
> *  escape a reserved character like  \. 
> *  use certain unreserved characters in a shorthand context e.g. \d means digits [0-9]
>  
> The leniency bug in RegExp is that it adds an extra rule to this list - any backslashed characters that don't satisfy the above rules are taken literally. For example, there's no reason to put a backslash in front of the letter "p" but we accept \p as the letter p.
> Java's Pattern class will throw a parse exception given a meaningless backslash like \p.
> We should too.
> In [Lucene-9336|https://issues.apache.org/jira/browse/LUCENE-9336] we added support for commonly supported regex expressions like `\d`. Sadly this is a breaking change because of the leniency that has allowed \d to be accepted as the letter d without an exception. Users were likely silently missing results they were hoping for and we made a BWC problem for ourselves in filling in the gaps.
> I propose we do like other RegEx parsers and error on inappropriate use of backslashes.
> This will be another breaking change so should target 9.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org