You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Holger Rehn (Jira)" <ji...@apache.org> on 2022/02/21 20:44:00 UTC
[jira] [Updated] (LUCENE-10430) Literal double quotes cause exception in class RegExp
[ https://issues.apache.org/jira/browse/LUCENE-10430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Holger Rehn updated LUCENE-10430:
---------------------------------
Description:
Class org.apache.lucene.util.automaton.RegExp fails to parse valid regular expressions that contain double quotes. This of course affects corresponding RegexpQuerys, as well.
Example:
{code:java}
Query q = new RegexpQuery( new Term( "field", "a\"b" ) );
RegExp r = new RegExp( "a\"b" );{code}
Both fail with:
{code:java}
java.lang.IllegalArgumentException: expected '"' at position 3
at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1299)
at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1229)
at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1218)
at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1192)
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1185)
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1187)
at org.apache.lucene.util.automaton.RegExp.parseInterExp(RegExp.java:1179)
at org.apache.lucene.util.automaton.RegExp.parseUnionExp(RegExp.java:1173)
at org.apache.lucene.util.automaton.RegExp.<init>(RegExp.java:496)
...{code}
As a workaround we simply replace all double quotes with a dot.
was:
Class org.apache.lucene.util.automaton.RegExp fails to parse valid regular expressions that contain double quotes. This of course affects corresponding RegexpQuerys, as well.
Example:
{code:java}
Query q = new RegexpQuery( new Term( "field", "a\"b" ) );
RegExp r = new RegExp( "a\"b" );{code}
Both fail with:
{code:java}
java.lang.IllegalArgumentException: expected '"' at position 3
at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1299)
at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1229)
at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1218)
at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1192)
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1185)
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1187)
at org.apache.lucene.util.automaton.RegExp.parseInterExp(RegExp.java:1179)
at org.apache.lucene.util.automaton.RegExp.parseUnionExp(RegExp.java:1173)
at org.apache.lucene.util.automaton.RegExp.<init>(RegExp.java:496)
...{code}
Unfortunately I don't see an easy workaround. Safely removing/replacing double quotes from the (user typed) regular expression is not an easy undertaking as it would require us to completely understand the regex.
> Literal double quotes cause exception in class RegExp
> -----------------------------------------------------
>
> Key: LUCENE-10430
> URL: https://issues.apache.org/jira/browse/LUCENE-10430
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: 9.0
> Reporter: Holger Rehn
> Priority: Critical
>
> Class org.apache.lucene.util.automaton.RegExp fails to parse valid regular expressions that contain double quotes. This of course affects corresponding RegexpQuerys, as well.
> Example:
> {code:java}
> Query q = new RegexpQuery( new Term( "field", "a\"b" ) );
> RegExp r = new RegExp( "a\"b" );{code}
> Both fail with:
> {code:java}
> java.lang.IllegalArgumentException: expected '"' at position 3
> at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1299)
> at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1229)
> at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1218)
> at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1192)
> at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1185)
> at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1187)
> at org.apache.lucene.util.automaton.RegExp.parseInterExp(RegExp.java:1179)
> at org.apache.lucene.util.automaton.RegExp.parseUnionExp(RegExp.java:1173)
> at org.apache.lucene.util.automaton.RegExp.<init>(RegExp.java:496)
> ...{code}
> As a workaround we simply replace all double quotes with a dot.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org