You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2019/07/19 09:03:00 UTC

[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

    [ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888722#comment-16888722 ] 

Adrien Grand commented on LUCENE-8928:
--------------------------------------

I played with this idea a bit at https://github.com/jpountz/lucene-solr/commit/16e6594af44b753c9ac498a063eb9b9d6102e020 and https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps.java with shapes. It's a bit artificial since we are using shapes to index points, but nevertheless I got 62% slower indexing (130 seconds instead of 80) but 45% faster searching for box queries (63.0 QPS instead of 43.5).

> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8928
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8928
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on values in other dimensions. While this may be ok for geo points, this is usually not true for ranges (or geo shapes, which are ranges too). Maybe we could get better indexing by re-computing the range of values on each dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org