You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "David Smiley (Confluence)" <co...@apache.org> on 2013/07/25 07:03:00 UTC

[CONF] Apache Solr Reference Guide > Spatial Search

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Spatial Search (https://cwiki.apache.org/confluence/display/solr/Spatial+Search)

Change Comment:
---------------------------------------------------------------------
Refactored to better orient the reader to Solr 3 & Solr 4 spatial.  A lot more is needed as it's not organized well.

Edited by David Smiley:
---------------------------------------------------------------------
Solr supports location data for use in spatial or geospatial searches. Using spatial search, you can:

* Index points or other shapes
* Filter search results by a bounding box or circle or by other shapes
* Sort or boost scoring by distance
* Index and search multi-value time or other numeric durations

With Solr 4, there are two field types for spatial search: {{LatLonType}} (or its non-geodetic twin {{PointType}}), or {{SpatialRecursivePrefixTreeFieldType}} (RPT for short).  RPT is new in Solr 4, offering more features than LatLonType, although LatLonType is still more appropriate when efficient distance sorting/boosting is desired.  They can both be used simultaneously.

For more information on Solr spatial search, see [http://wiki.apache.org/solr/SpatialSearch].

h3. Indexing and Configuration

For indexing geodetic points (latitude and longitude), supply the pair of numbers as a string with a comma separating them in latitude then longitude order.

TODO more details to be supplied later.  See the bottom of this page for RPT configuration specifics.

h3. Distance Function Queries

There are three function queries that support spatial search: [{{dist}}|http://wiki.apache.org/solr/FunctionQuery#dist], to determine the distance between two points; [{{hsin}}|http://wiki.apache.org/solr/FunctionQuery#hsin.2C_ghhsin_-_Haversine_Formula], to calculate the distance between two points on a sphere; and [{{sqedist}}|https://wiki.apache.org/solr/FunctionQuery#sqedist_-_Squared_Euclidean_Distance], to calculate the square Euclidean distance between two points. For more information about these function queries, see the section on [Function Queries].

h3. Spatial Search Features

Solr includes three useful tools for working with spatial queries: {{geofilt}}, a geospatial filter; {{bbox}}, a geospatial bounding-box filter; and {{geodist}}, a geospatial distance function.

h4. Spatial Search Parameters

The following parameters are used for spatial search:

|| Parameter || Description ||
| d | distance, in kilometers |
| pt | a lat/lon coordinate point |
| sfield | a spatial field, by default a {{location}} (lat/lon) field type. |

h4. {{geofilt}}

The {{geofilt}} filter allows you to retrieve results based on the distance from a given point. For example, to find all results for a product search within five kilometers of the lat/lon point, you could enter {{&q=\*:\*&fq=\{\!geofilt sfield=store\}&pt=45.15,-93.85&d=5}}. This filter returns all results within a circle of the given radius around the initial point:

!circle.png!

h4. {{bbox}}

{{bbox}} allows you to filter results based on a specified area around a given point. {{bbox}} takes the same parameters as {{geofilt}}, but rather than calculating all points in a circle within the given radius from the initial point, it only calculates the lower left and upper right corners of a square that would enclose a circle with the given radius. To return all results within five kilometers of a give point, you could enter {{...&q=}}{{{}*:*{}}}{{&fq={\!bbox sfield=store\}&pt=45.15,-93.85&d=5}}. The resulting bounding box would encompass all points within a five kilometer circle around the initial point, but it would also include some extra points in the corners of the bounding box that fall outside the five kilometer radius. Bounding box filters therefore can return results that fall outside your desired parameters, but they are much less "expensive" to implement.

!bbox.png!

{note}
When a bounding box includes a pole, the {{location}} field type produces a "bounding bowl" (a spherical cap) that includes all values that are north or south of the latitude of the bounding box corner (the lower left and the upper right) that is closer to the equator. In other words, Solr still calculates what the coordinates of the upper right corner and the lower left corner of the box would be just as in all other filtering cases, but it then take the corner that is closest to the equator (since it goes over the pole it may not be the lower left, despite the name) and filters by latitude only. This returns more matches than a pure bounding box match, but the query is both faster and easier to construct.
{note}

h4. {{geodist}}

{{geodist}} is a distance function that takes three optional parameters: {{(sfield,latitude,longitude)}}. You can use the {{geodist}} function to sort results by distance or score return results.

For example, to sort your results by ascending distance, enter {{...&q=\*:\*&fq=\{\!geofilt\}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist asc}}.

To return the distance as the document score, enter {{...&q=\{\!func\}geodist()&sfield=store&pt=45.15,-93.85&sort=score+asc}}.

h4. Post filtering

Post filtering is an option available for spatial queries qualifying {{bbox}} and {{geofilt}} with {{LatLonType}}, which specifies latitude and longitude. LatLonType is passed as numbers in the query, as shown in Example 1 below.

Filtering is usually done in parallel with or before the main  query. Post filters are applied after the main query. This is important  when the filter itself is very time-consuming,&nbsp; so it's better to always  apply it to matching documents instead of all documents.

h3. More Examples

Here are a few more useful examples of what you can do with spatial search in Solr.

h4. Use as a Sub-Query to Expand Search Results

Here we will query for results in Jacksonville, Florida, or within 50 kilometers of 45.15,-93.85 (near Buffalo, Minnesota):

{{&q=\*:\*&fq=(state:"FL" AND city:"Jacksonville") OR \_query\_:"\{\!geofilt\}"&sfield=store&pt=45.15,-93.85&d=50&sort=geodist()+asc}}

h4. Facet by Distance

To facet by distance, use the Frange query parser:

{{&q=\*:\*&sfield=store&pt=45.15,-93.85&facet.query=\{\!frange l=0 u=5\}geodist()&facet.query=\{\!frange l=5.001 u=3000\}geodist()}}

h4. Boost Nearest Results

Using the [DisMax|The DisMax Query Parser] or [Extended DisMax|The Extended DisMax Query Parser], you can combine spatial search with the boost function to boost the nearest results:

{{&q.alt=\*:\*&fq=\{\!geofilt\}&sfield=store&pt=45.15,-93.85&d=50&bf=recip(geodist(),2,200,20)&sort=score desc}}


h2. SpatialRecursivePrefixTreeFieldType (abbreviated as RPT)

The new approach to spatial offers several new features and improvements over the former approach:

* New shapes: polygons, line strings, and other new shapes
* Multi-valued indexed fields
* Ability to index non-point shapes as well as point shapes
* Rectangles with user-specified corners. The Solr 3 approach only supports bounding box of a circle
* Multi-value distance sort and score boosting
* Well-Known-Text support when JTS is used (for polygons, etc.)

The new approach incorporates the basic features of the Solr 3 approach, such as lat-lon bounding boxes and circles.

h3. Schema configuration

The first step to using Solr 4 spatial is to register a field type in {{schema.xml}}. There are several options for this field type.

|| Setting || Description ||
| name | The name of the field type. |
| class | For most use cases, using {{solr.SpatialRecursivePrefixTreeFieldType}} will be sufficient. Since the new spatial module in Lucene is meant to be a framework for different spatial "strategies", another class may be used. See the [Spatial4J project|https://github.com/spatial4j/spatial4j] for more information. |
| spatialContextFactory | If polygons or other shapes beyond a point, rectangle or circle are used, the [JTS Topology Suite|http://sourceforge.net/projects/jts-topo-suite/] is a required dependency. If you intend to use those shapes, defined the class here. |
| units | This is required, and currently can only be "degrees". |
| distErrPct | Defines the default precision of non-point shapes, as a fraction between 0.0 (fully precise) to 0.5. The closer this number is to zero, the more accurate the shape will be. However, more precise indexed shapes use more disk space and take longer to index. |
| maxDistErr | Defines the highest level of detail required for indexed data. If left blank, the default is one meter, just a bit less than 0.000009 degrees. |
| geo | If *true*, the default, latitude and longitude coordinates will be based on WGS84 instead of Euclidean/Cartesian based. If false, the coordinates will be Euclidean/Cartesian-based.|
| worldBounds | Defines the valid numerical ranges for x and y, in the format of "minX minY maxX maxY". If {{geo=true}}, this is assumed "-180 -90 180 90". If {{geo=false}}, you should define your boundaries for non-geospatial uses.|
| distCalculator | Defines the distance calculation algorithm. If {{geo=true}}, "haversine" is the default. If {{geo=false}}, "cartesian" will be the default. Other possible values are "lawOfCosines", "vincentySphere" and "cartesian^2". |
| prefixTree | Defines the spatial grid implementation. Since a PrefixTree (such as RecursivePrefixTree) maps the world as a grid, each grid cell is decomposed to another set of grid cells at the next level. Using a "geohash" implementation, there are 32 children at each level. If {{geo=true}}, "geohash" is the only option. If {{geo=false}}, "quad" could be used for {{prefixTree}}, which has 4 children at each level. |
| maxLevels | Sets the maximum grid depth for indexed data. It may be simpler to use {{maxDistErr}} to calculate real distances.|

{code:xml|borderStyle=solid|borderColor=#666666}
<fieldType name="location_rpt"   class="solr.SpatialRecursivePrefixTreeFieldType"
               spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
               distErrPct="0.025"
               maxDistErr="0.000009"
               units="degrees" />
{code}

Once the field type has been defined, use it to define a field.

Because this functionality is quite new and some of it is still experimental, please review the Solr Wiki at [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4] for more information about searching and sorting location data.

{tip}
As of Solr 4.1, the new approach to spatial search will also work with the {{\{!geofilt\}}} and {{\{!bbox\}}} query parsers.
{tip}

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action