You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ignacio Vera (Jira)" <ji...@apache.org> on 2019/09/24 07:42:00 UTC

[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

    [ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936519#comment-16936519 ] 

Ignacio Vera commented on LUCENE-8928:
--------------------------------------

I have played a bit more with this idea and I wondered if we need to compute exact bounds for every split. I modified [~jpountz] patch so instead of computing the bounds for every split, it computes every N splits. This is controlled by a static property called {{SPLITS_BEFORE_EXACT_BOUNDS}}.

The patch can be found here: https://github.com/iverase/lucene-solr/commit/e63f8c73a86c46ec406143fcd0cb31a8371dfe63

My test show that setting this value to 4 (compute exact bounds every 4 splits) reduces the indexing overhead to around 10% and keeps almost the same performance as the previous approach. Maybe we can find a better heuristic to set such value.

In addition, this patch does not apply for dimension <= 2 and the split algorithm is reverted to the original one.

 

> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8928
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8928
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on values in other dimensions. While this may be ok for geo points, this is usually not true for ranges (or geo shapes, which are ranges too). Maybe we could get better indexing by re-computing the range of values on each dimension before making the choice of the split dimension?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org