You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/10/05 18:19:20 UTC
[jira] [Commented] (LUCENE-7407) Explore switching doc values to an
iterator API
[ https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549552#comment-15549552 ]
ASF subversion and git services commented on LUCENE-7407:
---------------------------------------------------------
Commit 001a3ca55b30656e0e42f612d927a7923f5370e9 in lucene-solr's branch refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=001a3ca ]
LUCENE-7407: speed up iterating norms a bit by having default codec implement the iterator directly
> Explore switching doc values to an iterator API
> -----------------------------------------------
>
> Key: LUCENE-7407
> URL: https://issues.apache.org/jira/browse/LUCENE-7407
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Labels: docValues
> Fix For: master (7.0)
>
> Attachments: LUCENE-7407.patch
>
>
> I think it could be compelling if we restricted doc values to use an
> iterator API at read time, instead of the more general random access
> API we have today:
> * It would make doc values disk usage more of a "you pay for what
> what you actually use", like postings, which is a compelling
> reduction for sparse usage.
> * I think codecs could compress better and maybe speed up decoding
> of doc values, even in the non-sparse case, since the read-time
> API is more restrictive "forward only" instead of random access.
> * We could remove {{getDocsWithField}} entirely, since that's
> implicit in the iteration, and the awkward "return 0 if the
> document didn't have this field" would go away.
> * We can remove the annoying thread locals we must make today in
> {{CodecReader}}, and close the trappy "I accidentally shared a
> single XXXDocValues instance across threads", since an iterator is
> inherently "use once".
> * We could maybe leverage the numerous optimizations we've done for
> postings over time, since the two problems ("iterate over doc ids
> and store something interesting for each") are very similar.
> This idea has come up many in the past, e.g. LUCENE-7253 is a recent
> example, and very early iterations of doc values started with exactly
> this ;)
> However, it's a truly enormous change, likely 7.0 only. Or maybe we
> could have the new iterator APIs also ported to 6.x side by side with
> the deprecate existing random-access APIs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org