You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ignacio Vera (Jira)" <ji...@apache.org> on 2021/11/22 10:56:00 UTC

[jira] [Reopened] (LUCENE-9820) Separate logic for reading the BKD index from logic to intersecting it.

     [ https://issues.apache.org/jira/browse/LUCENE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ignacio Vera reopened LUCENE-9820:
----------------------------------

I am reopening the issue as I realised that we are not handling properly the case of pre-8.6 indexes. 

Currently we assume that all trees are unbalanced with all blocks fully populated with _max_points_per_leaf_ points except the last one which can be under populated. This is true now but it is not true for pre-8.6 indexes. 

In pre-8.6 indexes, high dimensional trees (numDims > 1) were populated differently, they were fully balance and the number of point in blocks was evenly distributed in thee number of leaves. Therefore in the current implementation we are providing the wrong size for those indices. 

Fortunately, in Lucene-8.6 a new points codec was added, therefore we have isolated trees that might have different structure in the Lucene 60 codec. I am opening a PR to handle the case.

 
Note on BKD trees:

Just if there is a curious reader that reads the  [paper|https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf] on BKD trees, it states:
 
       
{noformat}
the Bkd-tree consists of a set of balanced kd-trees
{noformat}

Lucene BKD tree works differently to what it is stated in the paper. The paper describes building kd-trees depending on the number of points while Lucene builds a tree whenever a segment is flushed to disk which can contain any number of points.  



 

 

 

> Separate logic for reading the BKD index from logic to intersecting it.
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-9820
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9820
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>             Fix For: 9.1
>
>          Time Spent: 11h
>  Remaining Estimate: 0h
>
> Currently the class BKDReader contains all the logic for traversing the KD tree and the logic to read the actual index. This makes difficult to develop new visiting strategies, for example LUCENE-9619, where it is proposed to move Points from a visitor API to a custor-style API.
> The first step is to isolate the logic the read the index from the logic that visits the the tree. Another benefit of doing this, is that it will help evolving the index, for example moving the current index format to backwards codec without moving the visiting logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org