You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2021/01/26 14:42:00 UTC

[jira] [Commented] (LUCENE-9696) RegExp with group references

    [ https://issues.apache.org/jira/browse/LUCENE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272129#comment-17272129 ] 

Michael McCandless commented on LUCENE-9696:
--------------------------------------------

Thanks [~gus]!  We could separately consider adding group support to FSTs and to Lucene's {{Automaton}} classes, which are two separate implementations of fun finite-state algorithms.

> RegExp with group references
> ----------------------------
>
>                 Key: LUCENE-9696
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9696
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Gus Heck
>            Priority: Minor
>
> PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 found performance benefits using our own RegExp class instead. Unfortunately RegExp does not currently report matching subgroups which is key to PatternTypingFilter's use (and probably useful in other endeavors as well).  What's needed is reporting of sub-groups such that 
> new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for "foobar" --> somehow reports getGroup(1) as "bar"
> And getGroup() can be called on some object reasonably accessible to the code using RegExp in the first place.
> Clearly there's a lot to be worked out there since the normal usage pattern converts things to a DFA / run Automaton etc, and subgroups are not a natural concept for those classes. But if this could be achieved without loosing the performance benefits, that would be interesting :).
> Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575.  I won't be able to work on it any time soon to encourage anyone else interested to pick it up or to drop links or ideas in here. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org