You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/05/08 16:48:01 UTC

[jira] [Commented] (LUCENE-6365) Optimized iteration of finite strings

    [ https://issues.apache.org/jira/browse/LUCENE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534633#comment-14534633 ] 

Michael McCandless commented on LUCENE-6365:
--------------------------------------------

Thanks [~markus_heiden], new patch looks great.

Can we remove the limit to FiniteStringsIterator.init?  Seems like this ("abort iteration after N items") should be the caller's job?

Can we just pass the automaton to FSI's ctor?  I don't think we need a reuse API here...

bq. I am not sure if the implementation change of CompletionTokenStream is OK, because I set the position attribute at the end of the iteration instead of at the start of the iteration. The tests run fine, but someone should review that.

It is weird that CompletionTokenStream hijacks PositionIncrementAttribute like that, and I can't see anywhere that reads from that (and indeed tests pass if I comment it out).  Maybe [~areek] knows?  I think we should just remove it?

> Optimized iteration of finite strings
> -------------------------------------
>
>                 Key: LUCENE-6365
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6365
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 5.0
>            Reporter: Markus Heiden
>            Priority: Minor
>              Labels: patch, performance
>         Attachments: FiniteStringsIterator.patch, FiniteStringsIterator2.patch
>
>
> Replaced Operations.getFiniteStrings() by an optimized FiniteStringIterator.
> Benefits:
> Avoid huge hash set of finite strings.
> Avoid massive object/array creation during processing.
> "Downside":
> Iteration order changed, so when iterating with a limit, the result may differ slightly. Old: emit current node, if accept / recurse. New: recurse / emit current node, if accept.
> The old method Operations.getFiniteStrings() still exists, because it eases the tests. It is now implemented by use of the new FiniteStringIterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org