You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mark Harwood (Jira)" <ji...@apache.org> on 2020/05/15 10:57:00 UTC
[jira] [Resolved] (LUCENE-9370) RegExpQuery should error for
inappropriate use of \ character in input
[ https://issues.apache.org/jira/browse/LUCENE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Harwood resolved LUCENE-9370.
----------------------------------
Fix Version/s: master (9.0)
Resolution: Fixed
Fixed in https://github.com/apache/lucene-solr/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce
> RegExpQuery should error for inappropriate use of \ character in input
> ----------------------------------------------------------------------
>
> Key: LUCENE-9370
> URL: https://issues.apache.org/jira/browse/LUCENE-9370
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: master (9.0)
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: master (9.0)
>
>
> The RegExp class is too lenient in parsing user input which can confuse or mislead users and cause backwards compatibility issues as we enhance regex support.
> In normal regular expression syntax the backslash is used to:
> * escape a reserved character like \.
> * use certain unreserved characters in a shorthand context e.g. \d means digits [0-9]
>
> The leniency bug in RegExp is that it adds an extra rule to this list - any backslashed characters that don't satisfy the above rules are taken literally. For example, there's no reason to put a backslash in front of the letter "p" but we accept \p as the letter p.
> Java's Pattern class will throw a parse exception given a meaningless backslash like \p.
> We should too.
> In [Lucene-9336|https://issues.apache.org/jira/browse/LUCENE-9336] we added support for commonly supported regex expressions like `\d`. Sadly this is a breaking change because of the leniency that has allowed \d to be accepted as the letter d without an exception. Users were likely silently missing results they were hoping for and we made a BWC problem for ourselves in filling in the gaps.
> I propose we do like other RegEx parsers and error on inappropriate use of backslashes.
> This will be another breaking change so should target 9.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org