You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ignacio Vera (JIRA)" <ji...@apache.org> on 2019/07/22 18:12:00 UTC

[jira] [Comment Edited] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

    [ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173 ] 

Ignacio Vera edited comment on LUCENE-8928 at 7/22/19 6:11 PM:
---------------------------------------------------------------

I run this approach locally. It helps as well in the case of Geo3D (3 dimensions case) quite a bit. I tried different approaches to try to make indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)|| ||Force merge time (sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader heap (MB)||Reader heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec||     ||QPS  ||QPS ||           ||Hit count  ||Hit count    || 
 ||      ||          ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 


was (Author: ivera):
I run this approach locally. It helps as well in the case of Geo3D (3 dimensions case) quite a bit. I tried different approaches to try to make indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec||     ||QPS  ||QPS ||           ||Hit count  ||Hit count    || 
 ||      ||          ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 

> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8928
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8928
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on values in other dimensions. While this may be ok for geo points, this is usually not true for ranges (or geo shapes, which are ranges too). Maybe we could get better indexing by re-computing the range of values on each dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org