You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/03/23 09:03:11 UTC

[jira] [Updated] (LUCENE-6367) Can PrefixQuery subclass AutomatonQuery?

     [ https://issues.apache.org/jira/browse/LUCENE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-6367:
---------------------------------------
    Attachment: LUCENE-6367.patch

Patch, cutting over PrefixQuery to AutomatonQuery and removing
PrefixTermsEnum.

I explored the optimization of having Byte/CharRunAutomaton.run
optimize (short-circuit) when you're in a sink state but it became
quite difficult/invasive fixing all callers of .step to handle this.
With LUCENE-5879 we also need to know the sink state under-the-hood,
but that's separate from fixing .run to make use of it.

So I backed out that opto and tried just doing the PrefixQuery cutover
without optimizing for sink states.  I'm running a perf test w/
luceneutil and it looks like the impact is trivial (well within
noise).  Net/net I think it's fine to "just cutover" without the
invasive opto?

I also changed PrefixQuery's semantics to apply to full binary space
terms, not just UTF-8 space.  While this is technically a change in
behavior, it won't impact users who index only unicode terms.  It's
also necessary for LUCENE-5879, because if prefixing is done only in
unicode space (like today), then the resulting binary space automaton
will not have a sink state and auto-prefix can't apply.

If this part is somehow controversial I can revert and try to do it
only with LUCENE-5879 instead... if it's OK, I'll add some tests
showing that PrefixQuery on binary terms works.


> Can PrefixQuery subclass AutomatonQuery?
> ----------------------------------------
>
>                 Key: LUCENE-6367
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6367
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6367.patch
>
>
> Spinoff/blocker for LUCENE-5879.
> It seems like PrefixQuery should "simply" be an AutomatonQuery rather than specializing its own TermsEnum ... with maybe some performance improvements to ByteRunAutomaton.run to short-circuit once it's in a "sink state", AutomatonTermsEnum could be just as fast as PrefixTermsEnum.
> If we can do this it will make LUCENE-5879 simpler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org