You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/09/19 16:48:34 UTC

[jira] [Commented] (SOLR-6534) Multipolygon query problem with datelineRule=ccwRect

    [ https://issues.apache.org/jira/browse/SOLR-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140674#comment-14140674 ] 

David Smiley commented on SOLR-6534:
------------------------------------

The root cause is a known-bug in Spatial4j's ShapeCollection computing it's bounding box geodetically (e.g. not euclidean): https://github.com/spatial4j/spatial4j/issues/77
It occurs when multiple shapes share the same longitudinal span, and results in a world-wrapping longitude.  ShapeCollection is new to v0.4 to handle all the MULTI* shapes in WKT plus GEOMETRYCOLLECTION.  Previously, Spatial4j relied on it's wrapper on a JTS Geometry -- JtsGeometry.  JtsGeometry's computation of determining the geodetic bounding box has/had a similar bug too, but not this specific bug.  An easy way to use JtsGeometry in lieu of ShapeCollection is to switch back to the older WKT Parser that didn't create ShapeCollection's: wktShapeParserClass="com.spatial4j.core.io.jts.JtsWKTReaderShapeParser" and I don't see the problem behavior.  But I *think* then the dateline rule is ignored and you must accept the width180 behavior.

The bounding box is generally something that's okay to be larger than a true/optimal bounding box but may make things slower.  *However*, RPT uses the bounding box of a shape in conjunction with distErrPct to determine to which grid detail it is to traverse the hierarchical grid.  Given a gigantic bounding box coupled with the default distErrPct of 2.5%, it approximates the query shape to be so large as to include the indexed data you didn't want.   At query time it's safe to set distErrPct to 0.0, which yields correct results:
{noformat}
geo:"Intersects(MULTIPOLYGON(((-3 2,4 2,4 8,-3 8,-3 2)),((-3 -11,4 -11,4 -4,-3 -4,-3 -11)))) distErrPct=0.0"
{noformat}

FYI, I want to change the query-time default distErrPct to 0.0 as it rarely accounts for much of a performance difference and it yields more expected behavior.

I don't believe the dateline rule is actually related; that was a red-herring in the bug report from what I can tell.

> Multipolygon query problem with datelineRule=ccwRect
> ----------------------------------------------------
>
>                 Key: SOLR-6534
>                 URL: https://issues.apache.org/jira/browse/SOLR-6534
>             Project: Solr
>          Issue Type: Bug
>          Components: spatial
>    Affects Versions: 4.9
>         Environment: Windows 7, Oracle JDK 1.7.0_45
>            Reporter: Jon H
>            Assignee: David Smiley
>
> We are currently upgrading from Solr 4.1 to 4.9 and have observed some odd behavior with multipolygon queries now. It is difficult to describe what is happening so I took a screenshot with the documents and query area plotted on a map. You can see it here: [http://imgur.com/iBpYLMh] The blue areas represent the multipolygon and the purple areas represent the document footprints.
> The query being used is as follows:
> {quote}
> geo:"Intersects(MULTIPOLYGON(((-3 2,4 2,4 8,-3 8,-3 2)),((-3 -11,4 -11,4 -4,-3 -4,-3 -11))))"
> {quote}
> This query returns all results when it should be returning only 8. If I run two separate queries with each individual polygon, I get 4 hits each as expected.
> I've narrowed this down to a problem with using 'datelineRule=ccwRect'. If I remove this setting, the query returns with the expected results. Unfortunately, this setting is required for our software though, since handling large polygon queries (spanning >180 degrees) are a requirement.
> Here are the relevant schema details:
> {quote}
> <field name="geo" type="location_rpt" indexed="true" stored="false"/>
> <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> geo="true" distErrPct="0.1" maxDistErr="0.000009" units="degrees"
> datelineRule="ccwRect" normWrapLongitude="true" autoIndex="true"/>
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org