You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nikolay Khitrin (JIRA)" <ji...@apache.org> on 2018/02/19 19:14:00 UTC
[jira] [Created] (LUCENE-8178) Bulk operations for LongValues and
Sorted[Set]DocValues
Nikolay Khitrin created LUCENE-8178:
---------------------------------------
Summary: Bulk operations for LongValues and Sorted[Set]DocValues
Key: LUCENE-8178
URL: https://issues.apache.org/jira/browse/LUCENE-8178
Project: Lucene - Core
Issue Type: Improvement
Affects Versions: 7.2.1
Reporter: Nikolay Khitrin
One-by-one DocValues iteration by {{advanceExact}} and {{nextOrd}}/{{ordValue}} is really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document.
To apply document-based bulk processing {{DocIdSetIterator}} matches have to be splitted to sequential docID runs and remapped to underlying {{LongValues}} positions.
After this transformation relatively large linear scans can be performed over packed integers.
To do this two new interfaces
1. {{LongValuesCollector}} ({{collectValue(long index, long value)}}).
2. {{OrdStatsCollector}} ({{collectOrd(long ord)}}, {{collectMissing(int count)}}).
and three new functions are introduced
1. {{LongValues.forRange(long begin, long end, LongValuesCollector collector)}}
2. {{SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}}
3. {{SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}}
with reference implementations.
Optimized versions of these functions are provided for:
1. {{DirectReader}} for non-32/64 bits per value cases (using {{PackedInts.Decoder}}).
2. {{Lucene70DocValuesProducer}} {{getSorted}} and {{getSortedSet}} (both sparse and dense).
Measured Solr facetting performance boost is up to 2 - 2.5x on real index.
Patch for Solr {{DocValuesFacets}} is also provided as separate file.
Implementation notes:
* {{OrdStatsCollector}} does not accept document id because it will ruin performance for {{SortedSetDocValues}} due to excessive position lookups.
* This patch is fully compatible with Lucene 7.0 DocValues format.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org