You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ignacio Vera (JIRA)" <ji...@apache.org> on 2018/02/08 15:36:00 UTC

[jira] [Comment Edited] (LUCENE-8126) Spatial prefix tree based on S2 geometry

    [ https://issues.apache.org/jira/browse/LUCENE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356757#comment-16356757 ] 

Ignacio Vera edited comment on LUCENE-8126 at 2/8/18 3:35 PM:
--------------------------------------------------------------

I have added the performance test classes on the branch in case you want to have a look to check results are not bias.

I have done two test (They do not include index size as I have no found a reliable way of getting that info), I can only show you for now some tipical results.

1) Indexing 5k random polygons and executing 50 Random queries. Trees have precision set to 1e-4 and distErrPct to 5%:

load geohash : 132,15 indexed shapes per second
 query geohash recursive : 10,88 queries per second
 query geohash composite : 7,29 queries per second

load quad : 299,04 indexed shapes per second
 query quad recursive : 15,13 queries per second
 query quad composite : 10,89 queries per second

load s2 arity 4 : 623,67 indexed shapes per second
 query s2 arity 4 recursive : 40,00 queries per second
 query s2 arity 4 composite : 21,51 queries per second

load s2 arity 16 : 283,46 indexed shapes per second
 query s2 arity 16 recursive : 12,22 queries per second
 query s2 arity 16 composite : 9,99 queries per second

load s2 arity 64 : 159,05 indexed shapes per second
 query s2 arity 64 recursive : 5,13 queries per second
 query s2 arity 64 composite : 3,58 queries per second

 

1) Indexing 50k random points and executing 50 Random queries. Trees have precision set to 1e-6 and distErrPct to 0%:

load geohash : 14898,69 indexed shapes per second
 query geohash recursive : 2,31 queries per second
 query geohash composite : 2,29 queries per second

load quad : 5068,42 indexed shapes per second
 query quad recursive : 5,10 queries per second
 query quad composite : 5,11 queries per second

load s2 arity 4 : 9748,49 indexed shapes per second
 query s2 arity 4 recursive : 11,34 queries per second
 query s2 arity 4 composite : 11,38 queries per second

load s2 arity 16 : 15513,50 indexed shapes per second
 query s2 arity 16 recursive : 3,86 queries per second
 query s2 arity 16 composite : 3,83 queries per second

load s2 arity 64 : 16886,19 indexed shapes per second
 query s2 arity 64 recursive : 1,56 queries per second
 query s2 arity 64 composite : 1,56 queries per second

 

Some data of my use case: I need to index ~3M  documents. 2M are points, 0.5M polygons and 0.5M multi-polygons with an averge size of 20. They need to be indexed on the celestial sphere (unit sphere). All polygons are squared (4 points) with different sizes, from 1 square degree to very tiny ones, all distributed mainly on the south hemisphere of the sphere. Moving from Geohash to S2 has provided the following benefits:

1) 20% reduccion of index size.

2) 75% reduccion on indexing time (yuhu!).

3) 2.5 times faster queries.

 

Anyway what we need is a benchmark for SPT the same way that there is one for BKD tree. Probably my next mini-project.

Conclusion: If you use Geo3D, you probably want to use S2 :)

 


was (Author: ivera):
I have added the performance test classes on the branch in case you want to have a look to check results are not bias.

I have done two test (They do not include index size as I have no found a reliable way of getting that info), I can only show you for now some tipical results.

1) Indexing 5k random polygons and executing 50 Random queries. Trees have precision set to 1e-4 and distErrPct to 5%:

load geohash : 132,15 indexed shapes per second
query geohash recursive : 10,88 queries per second
query geohash composite : 7,29 queries per second

load quad : 299,04 indexed shapes per second
query quad recursive : 15,13 queries per second
query quad composite : 10,89 queries per second

load s2 arity 4 : 623,67 indexed shapes per second
query s2 arity 4 recursive : 40,00 queries per second
query s2 arity 4 composite : 21,51 queries per second

load s2 arity 16 : 283,46 indexed shapes per second
query s2 arity 16 recursive : 12,22 queries per second
query s2 arity 16 composite : 9,99 queries per second

load s2 arity 64 : 159,05 indexed shapes per second
query s2 arity 64 recursive : 5,13 queries per second
query s2 arity 64 composite : 3,58 queries per second

 

1) Indexing 50k random points and executing 50 Random queries. Trees have precision set to 1e-6 and distErrPct to 0%:

load geohash : 14898,69 indexed shapes per second
query geohash recursive : 2,31 queries per second
query geohash composite : 2,29 queries per second

load quad : 5068,42 indexed shapes per second
query quad recursive : 5,10 queries per second
query quad composite : 5,11 queries per second

load s2 arity 4 : 9748,49 indexed shapes per second
query s2 arity 4 recursive : 11,34 queries per second
query s2 arity 4 composite : 11,38 queries per second

load s2 arity 16 : 15513,50 indexed shapes per second
query s2 arity 16 recursive : 3,86 queries per second
query s2 arity 16 composite : 3,83 queries per second

load s2 arity 64 : 16886,19 indexed shapes per second
query s2 arity 64 recursive : 1,56 queries per second
query s2 arity 64 composite : 1,56 queries per second

 

Some data of my use case: I need to index ~3M  documents. 2M are points, 0.5M polygons and 0.5M multi-polygons with an averge size of 20. They need to be indexed on the celestial sphere (unit sphere). All polygons are squared (4 points) with different sizes, from 1 square degree to very tiny ones, all distributed mainly on the south hemisphere of the sphere. Moving from Geohash to S2 has provided the following benefits:

1) 20% reduccion of index size.

2) 75% reduccion on indexing time (yuhu!). 

3) 2.5 times faster queries.

 

Anyway what we need is a benchmark for SPT the same way that there is one for BKD tree. Probably my next mini-project.

Conclusion: If you use Geo3D, you probably want to use S2 :)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> Spatial prefix tree based on S2 geometry
> ----------------------------------------
>
>                 Key: LUCENE-8126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8126
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial-extras
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi [~dsmiley],
> I have been working on a prefix tree based on goggle S2 geometry (https://s2geometry.io/) to be used mainly with Geo3d shapes with very promising results, in particular for complex shapes (e.g polygons). Using this pixelization scheme reduces the size of the index, improves the performance of the queries and reduces the loading time for non-point shapes. 
> If you are ok with this contribution and before providing any code I would like to understand what is the correct/prefered approach:
> 1) Add new depency to the S2 library (https://mvnrepository.com/artifact/io.sgr/s2-geometry-library-java). It has Apache 2.0 license so it should be ok.
> 2) Create a utility class with all methods necessary to navigate the S2 tree and create shapes from S2 cells (basically port what we need from the library into Lucene).
> What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org