You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/07/03 20:53:05 UTC

[jira] [Commented] (LUCENE-6645) BKD tree queries should use BitDocIdSet.Builder

    [ https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613403#comment-14613403 ] 

Michael McCandless commented on LUCENE-6645:
--------------------------------------------

Thanks [~jpountz], this is impressive!  A radix sorter, an Adder abstraction to skip hot loop if statements... looks like you had fun with hotspot :)

Did the Adder make a substantial improvement over the more straightforward if-per-add?  Maybe we could just add a .grow which would pre-grow the int[] if necessary ... not sure.

Most of the changes are in DocIdSetBuilder, and less so in BKDTreeReader.  You might get a bit more speedup if instead of {{prepareAdd(1)}}, you did the {{prepareAdd}} up front outside the loop with the worst case {{count}} (number of docs in the leaf block)?  I.e. this would reserve for the worse case (all docs pass the filter).  These leaf blocks are smallish by default (up to 1024 docs), and worst case is this upgrades to a bitset a bit early?  Not sure it will help that much, since most of the cells are visited via addAll...

In each cell of the BKD tree the docIDs are sorted ... but we don't take advantage of that here (not sure how we would).

Maybe we should try to contain the added hairiness to BKDTreeReader instead of DocIdSetBuilder if indeed this is the only user of this API that is so strange (or of tons of tiny sorted docID blocks)


> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
>                 Key: LUCENE-6645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6645
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-6645.patch, LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this builder (which makes a sparse bit set at first and then upgrades to dense if enough bits get set) and being disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously wasteful for small queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)?  I also had to make my own DISI wrapper around each leaf cell... maybe that was the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by a SentinelIntSet) might be faster ... though it'd need to be sorted in the and after building before returning to Lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org