You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/11/19 17:22:11 UTC

[jira] [Commented] (LUCENE-6901) Optimize 1D dimensional value indexing

    [ https://issues.apache.org/jira/browse/LUCENE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013784#comment-15013784 ] 

Michael McCandless commented on LUCENE-6901:
--------------------------------------------

OK I tested the 1D case: this patch reduces indexing time from 370.2 sec (trunk) to 69.0 sec (this patch) for the 32 bit case, just indexing and searching on latitude (quantized to an int) from the 2D London, UK benchmark.

This is quite a bit faster than {{NumericField}} which takes 175.6 sec to build the index.

Index size is the same, heap used at search time is a bit smaller (2.3 MB -> 2.1 MB) just because the merge implementation packs each leaf block with the maximum allowed count vs indexing which is between 50% and 100% of the maximum, and search speed is the same.

I'll test 2D next ... the {{IntroSorter}} change should have sped that up somewhat ... I'm going to try {{TimSorter}} next :)


> Optimize 1D dimensional value indexing
> --------------------------------------
>
>                 Key: LUCENE-6901
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6901
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk
>
>         Attachments: LUCENE-6901.patch
>
>
> Dimensional values give a smaller index, and faster search times, for indexing ordered byte[] values across one or more dimensions, vs our existing approaches, but the indexing time is substantially slower.
> Since the 1D case is so important/common (numeric fields, range query) I think it's worth optimizing its indexing time.  It should also be possible to optimize the N > 1 dimensions case too, but it's more complex ... we can postpone that.
> So for the 1D case, I changed the merge method to do a merge sort (like postings) of the already sorted segments dimensional values, instead of simply re-indexing all values from the incoming segments, and this was a big speedup.
> I also changed from {{InPlaceMergeSorter}} to {{IntroSorter}} (this is what postings use, and it's faster but still safe) and this was another good speedup, which should also help the > 1D cases.
> Finally, I added a {{BKDReader.verify}} method (currently it's dark: NOT called) that walks the index and then check that every value in each leaf block does in fact fall within what the index expected/claimed.  This is useful for finding bugs!  Maybe we can cleanly fold it into {{CheckIndex}} somehow later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org