You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Nikolay Khitrin (JIRA)" <ji...@apache.org> on 2018/02/19 19:14:00 UTC

[jira] [Created] (LUCENE-8178) Bulk operations for LongValues and Sorted[Set]DocValues

Nikolay Khitrin created LUCENE-8178:
---------------------------------------

             Summary: Bulk operations for LongValues and Sorted[Set]DocValues
                 Key: LUCENE-8178
                 URL: https://issues.apache.org/jira/browse/LUCENE-8178
             Project: Lucene - Core
          Issue Type: Improvement
    Affects Versions: 7.2.1
            Reporter: Nikolay Khitrin


One-by-one DocValues iteration by {{advanceExact}} and {{nextOrd}}/{{ordValue}} is really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document.

To apply document-based bulk processing {{DocIdSetIterator}} matches have to be splitted to sequential docID runs and remapped to underlying {{LongValues}} positions.
 After this transformation relatively large linear scans can be performed over packed integers.

 

To do this two new interfaces

1. {{LongValuesCollector}} ({{collectValue(long index, long value)}}).
 2. {{OrdStatsCollector}} ({{collectOrd(long ord)}}, {{collectMissing(int count)}}).

and three new functions are introduced

1. {{LongValues.forRange(long begin, long end, LongValuesCollector collector)}}
 2. {{SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}}
 3. {{SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}}

with reference implementations.

Optimized versions of these functions are provided for:
 1. {{DirectReader}} for non-32/64 bits per value cases (using {{PackedInts.Decoder}}).
 2. {{Lucene70DocValuesProducer}} {{getSorted}} and {{getSortedSet}} (both sparse and dense).

 

Measured Solr facetting performance boost is up to 2 - 2.5x on real index.
 Patch for Solr {{DocValuesFacets}} is also provided as separate file.

 

Implementation notes:
 * {{OrdStatsCollector}} does not accept document id because it will ruin performance for {{SortedSetDocValues}} due to excessive position lookups.
 * This patch is fully compatible with Lucene 7.0 DocValues format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org