You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gus Heck (Jira)" <ji...@apache.org> on 2021/01/25 10:41:00 UTC

[jira] [Created] (LUCENE-9696) RegExp with group references

Gus Heck created LUCENE-9696:
--------------------------------

             Summary: RegExp with group references
                 Key: LUCENE-9696
                 URL: https://issues.apache.org/jira/browse/LUCENE-9696
             Project: Lucene - Core
          Issue Type: Wish
            Reporter: Gus Heck


PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 found performance benefits using our own RegExp class instead. Unfortunately RegExp does not currently report matching subgroups which is key to PatternTypingFilter's use (and probably useful in other endeavors as well).  What's needed is reporting of sub-groups such that 

new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for "foobar" --> somehow reports getGroup(1) as "bar"

And getGroup() can be called on some object reasonably accessible to the code using RegExp in the first place.

Clearly there's a lot to be worked out there since the normal usage pattern converts things to a DFA / run Automaton etc, and subgroups are not a natural concept for those classes. But if this could be achieved without loosing the performance benefits, that would be interesting :).

Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575.  I won't be able to work on it any time soon to encourage anyone else interested to pick it up or to drop links or ideas in here. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org