You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Markus Heiden (JIRA)" <ji...@apache.org> on 2015/07/02 23:16:06 UTC

[jira] [Commented] (LUCENE-6365) Optimized iteration of finite strings

    [ https://issues.apache.org/jira/browse/LUCENE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612534#comment-14612534 ] 

Markus Heiden commented on LUCENE-6365:
---------------------------------------

I adapted my patch to the latest changes in trunk. 

I think the reuse of the iterator is one core part of this whole patch. I tried to rework the api of the iterator so that the reuse case and the no-reuse case are handled in a similar way. I hope you like it now (at least a bit). Lucene does this kind of reuse already, e.g. see Transition.

FuzzyCompletionQuery has been added lately and relies on the old big set of finite strings. I am not sure how to rework it. Currently it still uses the set, maybe it is better to use the iterator inside of FuzzyCompletionWeight, but this means recomputing the finite strings over and over again. What do you think?

BTW topoSortStates() is implemented by AnalyzingSuggester and CompletionTokenStream identically. Maybe it should be moved to one place, maybe to Operations? 

> Optimized iteration of finite strings
> -------------------------------------
>
>                 Key: LUCENE-6365
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6365
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 5.0
>            Reporter: Markus Heiden
>            Priority: Minor
>              Labels: patch, performance
>         Attachments: FiniteStrings_reuse.patch
>
>
> Replaced Operations.getFiniteStrings() by an optimized FiniteStringIterator.
> Benefits:
> Avoid huge hash set of finite strings.
> Avoid massive object/array creation during processing.
> "Downside":
> Iteration order changed, so when iterating with a limit, the result may differ slightly. Old: emit current node, if accept / recurse. New: recurse / emit current node, if accept.
> The old method Operations.getFiniteStrings() still exists, because it eases the tests. It is now implemented by use of the new FiniteStringIterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org