You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2012/06/28 19:55:30 UTC

[Solr Wiki] Update of "SolrAdaptersForLuceneSpatial4" by DavidSmiley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrAdaptersForLuceneSpatial4" page has been changed by DavidSmiley:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

Comment:
1/2 way though

New page:
Note: This page is a working draft of documentation.  It needs to be migrated/merged/moved/renamed into the existing Solr spatial wiki content somehow.

= Lucene / Solr 4 Spatial =

This document describes how to use the new spatial functionality in Lucene / Solr 4.  The bulk of the implementation lives in the new Lucene spatial module in v4 committed on March 13th.  It replaces the former "Lucene spatial contrib" in v3.  The Solr piece is small as it only needs to provide field types which are essentially adapters to the code in the Lucene spatial module.  Furthermore, understand that the shape implementations and other core spatial code that isn't related to Lucene is held in another new open-source project called Spatial4j.  Presently, polygon support requires an additional dependency -- JTS.  As of this writing, 28-June 2012, the Solr portion has yet to be introduced into Solr trunk. It should come into Solr via SOLR-3304 "soon".


== New features, over Solr 3 spatial ==

Note: "Solr 3 spatial" refers to the spatial support introduced in that version of Solr which still exists in v4.  Solr 3 spatial does ''not'' actually use Lucene 3's spatial contrib module aside from DistanceUtils.java.

These features describe what developer-users of Lucene/Solr 4 will appreciate.  Under the hood, it's a framework designed to be extended for different so-called spatial strategies.  I'll assume here the RecursivePrefixTreeStrategy as it should address most use-cases and it's has the best tests.

 * Multi-value indexes.  This is key for any project that geocodes natural language documents, since a variable number of locations are extracted from text.
 * Index shapes with area, not just points.  An indexed shape is essentially pixelated (i.e. gridded) to a configured resolution per shape.  Note: If extremely high precision of the edges of the shape needs to be retained for accurate searching, then this solution probably won't scale well compared to other approaches such as those that index the bounding box but retain the original shape vector.  Note: this capability sorely needs testing.
 * A polygon shape.  It can be the indexed shape or query shape.  Note: This requires the JTS dependency.  The polygon assumes a Mercator / Cartesian projection, and consequently doesn't support pole-wrap.  As of 1 June 2012 in Spatial4j 0.3-SNAPSHOT, it does support dateline crossing.
 * Multi-value distance sort / score boost.  Note: this is a preliminary unoptimized implementation that uses a fair amount of RAM. 
 * Configurable precision which can vary per shape at both index & query time.  This enhances the performance.  Solr 3 indexes and queries based on the full precision of a double for latitude and longitude, which is excessive for nearly any use-case.
 * Fast filtering.  The code was benchmarked once showing it outperforms Solr 3's "LatLonType" at its own game (single valued indexed points), and a 3rd party anecdotally reported it was faster on his large index.  It hasn't been benchmarked in well over a year now though, and this is a TODO item.  Also, Solr 3 LatLonType sometimes requires all the points to be in memory, whereas the new spatial module here doesn't for filtering.

Of course, the basics in Solr 3 not mentioned here are implemented in this framework.  For example, lat-lon bounding boxes and circles.

= How to Use =

== Configuration ==

== Indexing ==

== Search ==

== Final Notes ==