You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/07/04 08:51:00 UTC

[jira] [Commented] (LUCENE-8888) Improve distribution of points with data dimension in BKD tree leaves

    [ https://issues.apache.org/jira/browse/LUCENE-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878449#comment-16878449 ] 

ASF subversion and git services commented on LUCENE-8888:
---------------------------------------------------------

Commit 5bf6cf2eddf60a0d2696f31b9a252eb7af6f9c32 in lucene-solr's branch refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5bf6cf2 ]

LUCENE-8888: Improve distribution of points with data dimensions in BKD tree leaves (#747)



> Improve distribution of points with data dimension in BKD tree leaves
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-8888
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8888
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains duplicated points. This works well with indexed dimension as the process of partition the space and the final sorting of leaves groups points with equal indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It might happen that if two points have the same indexed dimensions but different data dimensions, the distribution on the leaves is not the most optimal.
> A good example is if a user tries to index a bounding box using LatLonShape. The resulting tessellation of a bounding box is two triangles with the same indexed dimensions but different data dimensions. If there are two documents indexing the same bounding box, the result in the leaf is the triangles from one document followed by the triangles of the second document. This is  because the current sorting/selection algorithms  use one indexed dimension and tie-break on the 
> docID.
> The most optimal distribution in the case above is two group together the equal triangles. Therefore what it is propose here is to update the selection/ sorting algorithms to use the data dimensions when they exist as tie-breakers before using the docID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org