You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Brian Feldman (Jira)" <ji...@apache.org> on 2021/02/01 15:00:00 UTC

[jira] [Commented] (LUCENE-9718) REGEX Pattern Search, character classes with quantifiers do not work

    [ https://issues.apache.org/jira/browse/LUCENE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276387#comment-17276387 ] 

Brian Feldman commented on LUCENE-9718:
---------------------------------------

{code:java}
// code placeholder
/** 
* Lucene/Automaton Regex Check  
*
* @param regex
* @param checkValue
* @return true if matched  
*/
public boolean luceneRegexCheck(String regex, String checkValue) {
   //import dk.brics.automaton.RegExp;
   //import dk.brics.automaton.RunAutomaton;
   //RegExp re = new RegExp(regex);
   //RunAutomaton ra = new RunAutomaton(re.toAutomaton());
   //return ra.run(regexMatches);

   CharacterRunAutomaton automaton = new CharacterRunAutomaton(new RegExp(regex).toAutomaton());
   return automaton.run(checkValue);
}

@Test
void REGEXTEST() { 
   String regex = "[0-9]{2,3}";
   String regexMatches = "11";

   // Lucene Automaton Regex
   assertTrue(luceneRegexCheck(regex, regexMatches), "Lucene Regex Failed to Match");
}

@Test
void REGEXTEST2() {
   String regex = "\\d{2,3}";
   String regexMatches = "11";
 
   // Lucene Automaton Regex
   assertTrue(luceneRegexCheck(regex, regexMatches), "Lucene Regex Failed to Match");
}

{code}

> REGEX Pattern Search, character classes with quantifiers do not work
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9718
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9718
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 7.7.3, 8.6.3
>            Reporter: Brian Feldman
>            Priority: Minor
>
> Character classes with a quantifier do not work, no error is given and no results are returned. For example \d\{2} or \d\{2,3} as is commonly written in most languages supporting regular expressions, simply and quietly does not work.  A user work around is to write them fully out such as \d\d or [0-9][0-9] or as [0-9]\{2,3} .
>  
> This inconsistency or limitation is not documented, wasting the time of users as they have to figure this out themselves. I believe this inconsistency should be clearly documented and an effort to fixing the inconsistency would improve pattern searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org