You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ds...@apache.org on 2012/09/24 06:48:47 UTC
svn commit: r1389206 - in /lucene/dev/branches/lucene_solr_4_0: ./ lucene/
lucene/spatial/ lucene/spatial/src/java/org/apache/lucene/spatial/
lucene/spatial/src/java/org/apache/lucene/spatial/prefix/
lucene/spatial/src/java/org/apache/lucene/spatial/pr...
Author: dsmiley
Date: Mon Sep 24 04:48:47 2012
New Revision: 1389206
URL: http://svn.apache.org/viewvc?rev=1389206&view=rev
Log:
Document the SpatialStrategies consistently and more thoroughly.
Modified:
lucene/dev/branches/lucene_solr_4_0/ (props changed)
lucene/dev/branches/lucene_solr_4_0/lucene/ (props changed)
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/ (props changed)
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java
lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java Mon Sep 24 04:48:47 2012
@@ -40,12 +40,12 @@ import org.apache.lucene.spatial.query.S
* <li>What types of query shapes can be used?</li>
* <li>What types of query operations are supported?
* This might vary per shape.</li>
- * <li>Are there caches? Under what circumstances are they used?
- * Roughly how big are they? Is it segmented by Lucene segments, such as is
- * done by the Lucene {@link org.apache.lucene.search.FieldCache} and
- * {@link org.apache.lucene.index.DocValues} (ideal) or is it for the entire
- * index?
+ * <li>Does it use the {@link org.apache.lucene.search.FieldCache}, {@link
+ * org.apache.lucene.index.DocValues} or some other type of cache? When?
* </ul>
+ * If a strategy only supports certain shapes at index or query time, then in
+ * general it will throw an exception if given an incompatible one. It will not
+ * be coerced into compatibility.
* <p/>
* Note that a SpatialStrategy is not involved with the Lucene stored field
* values of shapes, which is immaterial to indexing & search.
@@ -85,7 +85,7 @@ public abstract class SpatialStrategy {
}
/**
- * Returns the IndexableField(s) from the <code>shape</code> that are to be
+ * Returns the IndexableField(s) from the {@code shape} that are to be
* added to the {@link org.apache.lucene.document.Document}. These fields
* are expected to be marked as indexed and not stored.
* <p/>
@@ -139,7 +139,7 @@ public abstract class SpatialStrategy {
/**
* Returns a ValueSource with values ranging from 1 to 0, depending inversely
* on the distance from {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
- * The formula is <code>c/(d + c)</code> where 'd' is the distance and 'c' is
+ * The formula is {@code c/(d + c)} where 'd' is the distance and 'c' is
* one tenth the distance to the farthest edge from the center. Thus the
* scores will be 1 for indexed points at the center of the query shape and as
* low as ~0.1 at its furthest edges.
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java Mon Sep 24 04:48:47 2012
@@ -38,8 +38,41 @@ import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
/**
- * Abstract SpatialStrategy which provides common functionality for those
- * Strategys which use {@link SpatialPrefixTree}s
+ * An abstract SpatialStrategy based on {@link SpatialPrefixTree}. The two
+ * subclasses are {@link RecursivePrefixTreeStrategy} and {@link
+ * TermQueryPrefixTreeStrategy}. This strategy is most effective as a fast
+ * approximate spatial search filter.
+ *
+ * <h4>Characteristics:</h4>
+ * <ul>
+ * <li>Can index any shape; however only {@link RecursivePrefixTreeStrategy}
+ * can effectively search non-point shapes. <em>Not tested.</em></li>
+ * <li>Can index a variable number of shapes per field value. This strategy
+ * can do it via multiple calls to {@link #createIndexableFields(com.spatial4j.core.shape.Shape)}
+ * for a document or by giving it some sort of Shape aggregate (e.g. JTS
+ * WKT MultiPoint). The shape's boundary is approximated to a grid precision.
+ * </li>
+ * <li>Can query with any shape. The shape's boundary is approximated to a grid
+ * precision.</li>
+ * <li>Only {@link org.apache.lucene.spatial.query.SpatialOperation#Intersects}
+ * is supported. If only points are indexed then this is effectively equivalent
+ * to IsWithin.</li>
+ * <li>The strategy supports {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
+ * even for multi-valued data. However, <em>it will likely be removed in the
+ * future</em> in lieu of using another strategy with a more scalable
+ * implementation. Use of this call is the only
+ * circumstance in which a cache is used. The cache is simple but as such
+ * it doesn't scale to large numbers of points nor is it real-time-search
+ * friendly.</li>
+ * </ul>
+ *
+ * <h4>Implementation:</h4>
+ * The {@link SpatialPrefixTree} does most of the work, for example returning
+ * a list of terms representing grids of various sizes for a supplied shape.
+ * An important
+ * configuration item is {@link #setDistErrPct(double)} which balances
+ * shape precision against scalability. See those javadocs.
+ *
* @lucene.internal
*/
public abstract class PrefixTreeStrategy extends SpatialStrategy {
@@ -53,7 +86,12 @@ public abstract class PrefixTreeStrategy
this.grid = grid;
}
- /** Used in the in-memory ValueSource as a default ArrayList length for this field's array of values, per doc. */
+ /**
+ * A memory hint used by {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
+ * for how big the initial size of each Document's array should be. The
+ * default is 2. Set this to slightly more than the default expected number
+ * of points per document.
+ */
public void setDefaultFieldValuesArrayLen(int defaultFieldValuesArrayLen) {
this.defaultFieldValuesArrayLen = defaultFieldValuesArrayLen;
}
@@ -63,8 +101,14 @@ public abstract class PrefixTreeStrategy
}
/**
- * The default measure of shape precision affecting indexed and query shapes.
- * Specific shapes at index and query time can use something different.
+ * The default measure of shape precision affecting shapes at index and query
+ * times. Points don't use this as they are always indexed at the configured
+ * maximum precision ({@link org.apache.lucene.spatial.prefix.tree.SpatialPrefixTree#getMaxLevels()});
+ * this applies to all other shapes. Specific shapes at index and query time
+ * can use something different than this default value. If you don't set a
+ * default then the default is {@link SpatialArgs#DEFAULT_DISTERRPCT} --
+ * 2.5%.
+ *
* @see org.apache.lucene.spatial.query.SpatialArgs#getDistErrPct()
*/
public void setDistErrPct(double distErrPct) {
@@ -82,7 +126,8 @@ public abstract class PrefixTreeStrategy
List<Node> cells = grid.getNodes(shape, detailLevel, true);//true=intermediates cells
//If shape isn't a point, add a full-resolution center-point so that
// PointPrefixTreeFieldCacheProvider has the center-points.
- // TODO index each center of a multi-point? Yes/no?
+ //TODO index each point of a multi-point or other aggregate.
+ //TODO remove this once support for a distance ValueSource is removed.
if (!(shape instanceof Point)) {
Point ctr = shape.getCenter();
//TODO should be smarter; don't index 2 tokens for this in CellTokenStream. Harmless though.
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java Mon Sep 24 04:48:47 2012
@@ -34,13 +34,15 @@ import java.io.IOException;
import java.util.LinkedList;
/**
- * Performs a spatial intersection filter between a query shape and a field indexed with {@link SpatialPrefixTree}, a Trie.
- * SPT yields terms (grids) at length 1 and at greater lengths corresponding to greater precisions.
- * This filter recursively traverses each grid length and uses methods on {@link Shape} to efficiently know
- * that all points at a prefix fit in the shape or not to either short-circuit unnecessary traversals or to efficiently
- * load all enclosed points. If no indexed data lies in a portion of the shape
- * then that portion of the query shape is quickly passed over without
- * decomposing the shape unnecessarily.
+ * Performs a spatial intersection filter between a query shape and a field
+ * indexed with {@link SpatialPrefixTree}, a Trie. SPT yields terms (grids) at
+ * length 1 (aka "Level 1") and at greater lengths corresponding to greater
+ * precisions. This filter recursively traverses each grid length and uses
+ * methods on {@link Shape} to efficiently know that all points at a prefix fit
+ * in the shape or not to either short-circuit unnecessary traversals or to
+ * efficiently load all enclosed points. If no indexed data lies in a portion
+ * of the shape then that portion of the query shape is quickly passed over
+ * without decomposing the shape unnecessarily.
*
* @lucene.internal
*/
@@ -167,7 +169,7 @@ RE "scan" threshold:
@Override
public String toString() {
- return "GeoFilter{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
+ return getClass().getSimpleName()+"{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
}
@Override
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java Mon Sep 24 04:48:47 2012
@@ -25,7 +25,11 @@ import org.apache.lucene.spatial.query.S
import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
/**
- * Based on {@link RecursivePrefixTreeFilter}.
+ * A {@link PrefixTreeStrategy} which uses {@link RecursivePrefixTreeFilter}.
+ * This strategy has support for searching non-point shapes (note: not tested).
+ * Even a query shape with distErrPct=0 (fully precise to the grid) should have
+ * good performance for typical data, unless there is a lot of indexed data
+ * coincident with the shape's edge.
*
* @lucene.experimental
*/
@@ -38,6 +42,13 @@ public class RecursivePrefixTreeStrategy
prefixGridScanLevel = grid.getMaxLevels() - 4;//TODO this default constant is dependent on the prefix grid size
}
+ /**
+ * Sets the grid level [1-maxLevels] at which indexed terms are scanned brute-force
+ * instead of by grid decomposition. By default this is maxLevels - 4. The
+ * final level, maxLevels, is always scanned.
+ *
+ * @param prefixGridScanLevel 1 to maxLevels
+ */
public void setPrefixGridScanLevel(int prefixGridScanLevel) {
//TODO if negative then subtract from maxlevels
this.prefixGridScanLevel = prefixGridScanLevel;
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java Mon Sep 24 04:48:47 2012
@@ -30,14 +30,14 @@ import org.apache.lucene.spatial.query.U
import java.util.List;
/**
- * A basic implementation of {@link PrefixTreeStrategy} using a large
- * {@link TermsFilter} of all the nodes from
- * {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape, int, boolean)}.
- * It only supports the search of indexed Point shapes.
- * <p />
- * The precision of query shapes is an important factor in using this Strategy.
- * If the precision is too precise then it will result in many terms which will
- * amount to a slower query.
+ * A basic implementation of {@link PrefixTreeStrategy} using a large {@link
+ * TermsFilter} of all the nodes from {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape,
+ * int, boolean)}. It only supports the search of indexed Point shapes.
+ * <p/>
+ * The precision of query shapes (distErrPct) is an important factor in using
+ * this Strategy. If the precision is too precise then it will result in many
+ * terms which will amount to a slower query.
+ *
* @lucene.experimental
*/
public class TermQueryPrefixTreeStrategy extends PrefixTreeStrategy {
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java Mon Sep 24 04:48:47 2012
@@ -28,10 +28,13 @@ import java.util.Collections;
import java.util.List;
/**
- * A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings at variable lengths corresponding to
- * variable precision. Each string corresponds to a spatial region.
- *
- * Implementations of this class should be thread-safe and immutable once initialized.
+ * A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings
+ * at variable lengths corresponding to variable precision. Each string
+ * corresponds to a rectangular spatial region. This approach is
+ * also referred to "Grids", "Tiles", and "Spatial Tiers".
+ * <p/>
+ * Implementations of this class should be thread-safe and immutable once
+ * initialized.
*
* @lucene.experimental
*/
Modified: lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java?rev=1389206&r1=1389205&r2=1389206&view=diff
==============================================================================
--- lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java (original)
+++ lucene/dev/branches/lucene_solr_4_0/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java Mon Sep 24 04:48:47 2012
@@ -44,14 +44,31 @@ import org.apache.lucene.spatial.util.Ca
import org.apache.lucene.spatial.util.ValueSourceFilter;
/**
- * Simple {@link SpatialStrategy} which represents Points in two numeric {@link DoubleField}s.
+ * Simple {@link SpatialStrategy} which represents Points in two numeric {@link
+ * DoubleField}s. The Strategy's best feature is decent distance sort.
*
- * Note, currently only Points can be indexed by this Strategy. At query time, the bounding
- * box of the given Shape is used to create {@link NumericRangeQuery}s to efficiently
- * find Points within the Shape.
+ * <h4>Characteristics:</h4>
+ * <ul>
+ * <li>Only indexes points; just one per field value.</li>
+ * <li>Can query by a rectangle or circle.</li>
+ * <li>{@link
+ * org.apache.lucene.spatial.query.SpatialOperation#Intersects} and {@link
+ * SpatialOperation#IsWithin} is supported.</li>
+ * <li>Uses the FieldCache for
+ * {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)} and for
+ * searching with a Circle.</li>
+ * </ul>
*
- * Due to the simple use of numeric fields, this Strategy provides support for sorting by
- * distance through {@link DistanceValueSource}
+ * <h4>Implementation:</h4>
+ * This is a simple Strategy. Search works with {@link NumericRangeQuery}s on
+ * an x & y pair of fields. A Circle query does the same bbox query but adds a
+ * ValueSource filter on
+ * {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
+ * <p />
+ * One performance shortcoming with this strategy is that a scenario involving
+ * both a search using a Circle and sort will result in calculations for the
+ * spatial distance being done twice -- once for the filter and second for the
+ * sort.
*
* @lucene.experimental
*/