You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/08/05 17:23:20 UTC

[jira] [Created] (LUCENE-7407) Explore switching doc values to an iterator API

Michael McCandless created LUCENE-7407:
------------------------------------------

             Summary: Explore switching doc values to an iterator API
                 Key: LUCENE-7407
                 URL: https://issues.apache.org/jira/browse/LUCENE-7407
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless


I think it could be compelling if we restricted doc values to use an
iterator API at read time, instead of the more general random access
API we have today:

  * It would make doc values disk usage more of a "you pay for what
    what you actually use", like postings, which is a compelling
    reduction for sparse usage.

  * I think codecs could compress better and maybe speed up decoding
    of doc values, even in the non-sparse case, since the read-time
    API is more restrictive "forward only" instead of random access.

  * We could remove {{getDocsWithField}} entirely, since that's
    implicit in the iteration, and the awkward "return 0 if the
    document didn't have this field" would go away.

  * We can remove the annoying thread locals we must make today in
    {{CodecReader}}, and close the trappy "I accidentally shared a
    single XXXDocValues instance across threads", since an iterator is
    inherently "use once".

  * We could maybe leverage the numerous optimizations we've done for
    postings over time, since the two problems ("iterate over doc ids
    and store something interesting for each") are very similar.

This idea has come up many in the past, e.g. LUCENE-7253 is a recent
example, and very early iterations of doc values started with exactly
this ;)

However, it's a truly enormous change, likely 7.0 only.  Or maybe we
could have the new iterator APIs also ported to 6.x side by side with
the deprecate existing random-access APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org