You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/11/19 16:50:10 UTC
[jira] [Created] (LUCENE-6901) Optimize 1D dimensional value
indexing
Michael McCandless created LUCENE-6901:
------------------------------------------
Summary: Optimize 1D dimensional value indexing
Key: LUCENE-6901
URL: https://issues.apache.org/jira/browse/LUCENE-6901
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: Trunk
Dimensional values give a smaller index, and faster search times, for indexing ordered byte[] values across one or more dimensions, vs our existing approaches, but the indexing time is substantially slower.
Since the 1D case is so important/common (numeric fields, range query) I think it's worth optimizing its indexing time. It should also be possible to optimize the N > 1 dimensions case too, but it's more complex ... we can postpone that.
So for the 1D case, I changed the merge method to do a merge sort (like postings) of the already sorted segments dimensional values, instead of simply re-indexing all values from the incoming segments, and this was a big speedup.
I also changed from {{InPlaceMergeSorter}} to {{IntroSorter}} (this is what postings use, and it's faster but still safe) and this was another good speedup, which should also help the > 1D cases.
Finally, I added a {{BKDReader.verify}} method (currently it's dark: NOT called) that walks the index and then check that every value in each leaf block does in fact fall within what the index expected/claimed. This is useful for finding bugs! Maybe we can cleanly fold it into {{CheckIndex}} somehow later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org