You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Smiley, David W." <ds...@mitre.org> on 2014/01/17 17:56:07 UTC

FW: [Jts-topo-suite-user] Persistent STR tree

FYI for those with spatial interests…

From: <Smiley>, "Smiley, David W." <ds...@mitre.org>>
Date: Friday, January 17, 2014 at 11:53 AM
To: Demeter Sztanko <sz...@gmail.com>>
Cc: "jts-topo-suite-user@lists.sourceforge.net<ma...@lists.sourceforge.net>" <jt...@lists.sourceforge.net>>
Subject: Re: [Jts-topo-suite-user] Persistent STR tree

So 4x not 10x; I’m not feeling depressed anymore ;-)

Your approach of using QuadTree and a coordinate reference system makes sense.  One day hopefully not too far away, I expect Lucene-spatial/Spatial4j will have built-in projection support.  But Instead of indexing bounding boxes, you should ideally be indexing the actual shapes, and then you can pull the WKB and check for actual intersection.

I’m excited to announce to you and others reading this that I’m currently working on a much more sophisticated system indexing shapes and computing intersections that will be much faster.  The first release (within 2-3 weeks) will index shapes using the grid and then any matches will be double-checked against a WKB representation stored in Lucene "doc-values”.  The subsequent release to occur within the next ~30 days will tweak the grid encoding to include a little bit more metadata such that most queries will be completely satisfied by examining the fast index grid; only shapes that barely touch an indexed shape will have to be double-checked against the WKB representation.  The net effect should be a dramatic increase in spatial accuracy and performance over the current scheme.  You can expect to see a blog post with illustrations about this within 30 days.

~ David

From: Demeter Sztanko <sz...@gmail.com>>
Date: Friday, January 17, 2014 at 11:13 AM
To: "Smiley, David W." <ds...@mitre.org>>
Cc: "jts-topo-suite-user@lists.sourceforge.net<ma...@lists.sourceforge.net>" <jt...@lists.sourceforge.net>>
Subject: Re: [Jts-topo-suite-user] Persistent STR tree

Hi David,

First of all, thanks for the development of lucene - it is an amazing and unique library.

Sorry, 10x was a very rough estimation - lucene is actually 4 times slower.

When using Lucene, I can perform around 700 queries/second (that's 8 threads on 8 core machine macbook Pro with ssd disks). With JTS STRTree I was able to get around 2800 queries/sec, so that's around 4x slowdown. I was counting only query performance, not indexing.

I am storing BB rectangles as geometry and the real geometry in WKB format as the value field of the record. And I am using QuadPrefixTree.

One thing I have noticed is that Lucene is dealing with lat/lng coordinates only - therefore it wont allow any other reference systems (I am using British reference grid: http://spatialreference.org/ref/epsg/27700/ ), so I had to scale down all bounding boxes so the coordinates fit into 0-180 interval.

I haven't tried any of the standalone databases as I believe the simple network overhead will kill all possible performance benefits. Also for other reasons I do not want to deal with those.

I still believe the operations I am performing on the RTree are relatively simple and Lucene is optimised for much more general use, so I have some hopes to enhance it's performance.

D.


On Fri, Jan 17, 2014 at 3:39 PM, Smiley, David W. <ds...@mitre.org>> wrote:
Whoops; forgot to reply-all.


From: <Smiley>, "Smiley, David W." <ds...@mitre.org>>
Date: Friday, January 17, 2014 at 10:15 AM
To: Demeter Sztanko <sz...@gmail.com>>
Subject: Re: [Jts-topo-suite-user] Persistent STR tree

Thanks for sharing your experience with Lucene-spatial.  I’m responsible for a large part of it.  I don’t think you’re ever going to get the performance of an in-memory structure to compare to an on-disk one (even SSD).  Of course if you find one then let me know.  FWIW I’m looking to improve the accuracy & performance of lucene-spatial a lot this year.  Can you tell me if the indexed spatial objects are all points or if it’s mostly non-points?  And was the 10x slower just query performance or did that include indexing?

In the NoSQL space (or shall we say… not a relational database space), the systems with the best spatial support to my knowledge are MongoDB, CouchDB (spatial module is add-on separately), and Lucene-spatial.  Your data set isn’t huge though; I’d try PostGIS if I were you.  And I’m very impressed with what I see in SQL Server.

Good luck,
  ~ David Smiley

From: Demeter Sztanko <sz...@gmail.com>>
Date: Friday, January 17, 2014 at 7:56 AM
To: "jts-topo-suite-user@lists.sourceforge.net<ma...@lists.sourceforge.net>" <jt...@lists.sourceforge.net>>
Subject: [Jts-topo-suite-user] Persistent STR tree

Hi all,

I need to store around 50M objects in a spatial index (I need only support for bulk insert and concurrent intersection() operations). I need then to semi-randomly access the objects (that is, I probably will have 300 requests within one location, then another 300 in another random location, etc.)

STRTree is great and fast, however I need around 50GB of RAM for fitting the tree which is unfortunately too expensive for me to maintain in long term.

I need a solution that can run on 1Gb of RAM and SSD disks (it's a digitalocean cloud instance)

I have also tried using Lucene for storing spatial index, which is also feasible but around 10 times slower even on SSD disks.

I was wondering if you know of any other minimal java libraries that can do what I am looking for yet they are still relatively fast.


Thanks,

D.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Jts-topo-suite-user mailing list
Jts-topo-suite-user@lists.sourceforge.net<ma...@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/jts-topo-suite-user