You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Guido Medina <gu...@temetra.com> on 2013/04/16 13:23:19 UTC

Solr 4.2.1 sorting by distance to polygon centre.

Hi,

I got everything in place, my polygons are indexing properly, I played a 
bit with LSP which helped me a lot, now, I have JTS 1.13 inside 
solr.war; here is my challenge:

I have big polygon (A) which contains smaller polygons (B and C), B and 
C have some intersection, so if I search for a coordinate inside the 3, 
I would like to sort by the distance to the centre of the polygons that 
match the criteria.

As example, let's say dot B is on the centre of B, dot C is at the 
centre of C and dot A is at the intersection of B and C which happens to 
be the centre of A, so for dot A should be polygon A first and so on. I 
could compute with the distances using the result but since Solr is 
doing a heavy load already, why not just include the sort in it.

Here is my field type definition:

         <!-- Spatial field type -->
         <fieldType name="location_rpt" 
class="solr.SpatialRecursivePrefixTreeFieldType"
spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
                    units="degrees"/>


Field definition:

         <!-- JTS spatial polygon field -->
         <field name="geopolygon" type="location_rpt" indexed="true" 
stored="false" required="false" multiValued="true"/>


I'm using the Solr admin UI first to shape my query and then moving to 
our web app which uses solrj, here is the XML form of my result which 
includes the query I'm making, which scores all distances to 1.0 (Not 
what I want):

|<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst  name="responseHeader">
   <int  name="status">0</int>
   <int  name="QTime">9</int>
   <lst  name="params">
     <str  name="fl">id,score</str>
     <str  name="sort">score asc</str>
     <str  name="indent">true</str>
     <str  name="q">*:*</str>
     <str  name="_">1366111120720</str>
     <str  name="wt">xml</str>
     <str  name="fq">{!score=distance}geopolygon:"Intersects(-6.271906 53.379284)"</str>
   </lst>
</lst>
<result  name="response"  numFound="3"  start="0"  maxScore="1.0">
   <doc>
     <str  name="id">uid13972</str>
     <float  name="score">1.0</float></doc>
   <doc>
     <str  name="id">uid13979</str>
     <float  name="score">1.0</float></doc>
   <doc>
     <str  name="id">uid13974</str>
     <float  name="score">1.0</float></doc>
</result>
</response>|


Thanks for all responses,

Guido.

Re: Solr 4.2.1 sorting by distance to polygon centre.

Posted by "Smiley, David W." <ds...@mitre.org>.
Guido,

I encourage you to try to open-source the shape-related code you have to
Spatial4j.  I realize that for some organizations, that can be really
difficult.  

~ David

On 4/16/13 11:55 AM, "Guido Medina" <gu...@temetra.com> wrote:

>David,
>
>   I just peak it at github, the method will estimate well for our
>purpose, but depends on JTS which we included in our Solr server "only",
>but we don't want LGPL libraries (v3) in our main project, kind of a
>show stopper, I understand is needed for spatial4j, Lucene and Solr in
>general, so we have no issues keeping it at the Solr server. But can't
>put it on main web project for licensing issues. I know JTS is a great
>set of needed functions for spatial projects. Shame I can't use it
>directly, like I had to develop some convex hull by myself.
>
>Guido.


Re: Solr 4.2.1 sorting by distance to polygon centre.

Posted by Guido Medina <gu...@temetra.com>.
David,

   I just peak it at github, the method will estimate well for our 
purpose, but depends on JTS which we included in our Solr server "only", 
but we don't want LGPL libraries (v3) in our main project, kind of a 
show stopper, I understand is needed for spatial4j, Lucene and Solr in 
general, so we have no issues keeping it at the Solr server. But can't 
put it on main web project for licensing issues. I know JTS is a great 
set of needed functions for spatial projects. Shame I can't use it 
directly, like I had to develop some convex hull by myself.

Guido.

On 16/04/13 16:14, Smiley, David W. wrote:
>
> On 4/16/13 10:57 AM, "Guido Medina" <gu...@temetra.com> wrote:
>
>> David,
>>
>> I have been following your stackoverflow posts, I understand what you
>> say, we decided to change the criteria and index an extra field (close
>> to your suggestion), so the sorting will happen now by polygon area desc
>> (Which induced another problem, calculation of polygon area on a
>> sphere), finally I got to the point of testing, also due to what you are
>> saying, is not a good idea to overload more than just the bare use of
>> points (Intersects) inside polygon to get the the list that matches
>> specific criteria.
> Glad you've been following what I've been up to and hopefully haven't
> gotten too confused :-).  I welcome all feedback.  BTW I'll be doing a 75
> minute spatial "deep dive" session at the Lucene/Solr Revolution
> conference in San Diego May 1st & 2nd.  Eventually the slides will be
> posted and hopefully the audio track.
>
>> To resume, calculate the "area of the polygon", again, for curved
>> polygons is not so obvious, do the standard solr search and sort by that
>> extra field, I guess solr overhead will be minimal in that case.
> FYI Spatial4j will do a decent job estimating it by calculating the
> geospatial area of the bounding box of a polygon and using the filled %
> ratio of the polygons 2D area to its Bbox.  This logic is in Spatial4j's
> JtsGeometry.getArea().
>
>
> So are you storing the area and sorting by it then?  (overhead is
> extremely minimal, this would just be an integer sort)
>
>> The real use case is for utility industry, let's say users have areas
>> where they get meter reads, readings are scheduled and assigned to the
>> users that contains such meter GPS location, some users might cover big
>> areas and possible to have smaller areas for other users inside such big
>> areas, so we changed the distance to center for area covered by, seemed
>> simpler and easier.
> You might want to consider doing both -- sort by a function query that
> combines both factors in some clever way.
>
> ~ David
>


Re: Solr 4.2.1 sorting by distance to polygon centre.

Posted by "Smiley, David W." <ds...@mitre.org>.

On 4/16/13 10:57 AM, "Guido Medina" <gu...@temetra.com> wrote:

>David,
>
>I have been following your stackoverflow posts, I understand what you
>say, we decided to change the criteria and index an extra field (close
>to your suggestion), so the sorting will happen now by polygon area desc
>(Which induced another problem, calculation of polygon area on a
>sphere), finally I got to the point of testing, also due to what you are
>saying, is not a good idea to overload more than just the bare use of
>points (Intersects) inside polygon to get the the list that matches
>specific criteria.

Glad you've been following what I've been up to and hopefully haven't
gotten too confused :-).  I welcome all feedback.  BTW I'll be doing a 75
minute spatial "deep dive" session at the Lucene/Solr Revolution
conference in San Diego May 1st & 2nd.  Eventually the slides will be
posted and hopefully the audio track.

>To resume, calculate the "area of the polygon", again, for curved
>polygons is not so obvious, do the standard solr search and sort by that
>extra field, I guess solr overhead will be minimal in that case.

FYI Spatial4j will do a decent job estimating it by calculating the
geospatial area of the bounding box of a polygon and using the filled %
ratio of the polygons 2D area to its Bbox.  This logic is in Spatial4j's
JtsGeometry.getArea().


So are you storing the area and sorting by it then?  (overhead is
extremely minimal, this would just be an integer sort)

>
>The real use case is for utility industry, let's say users have areas
>where they get meter reads, readings are scheduled and assigned to the
>users that contains such meter GPS location, some users might cover big
>areas and possible to have smaller areas for other users inside such big
>areas, so we changed the distance to center for area covered by, seemed
>simpler and easier.

You might want to consider doing both -- sort by a function query that
combines both factors in some clever way.

~ David


Re: Solr 4.2.1 sorting by distance to polygon centre.

Posted by Guido Medina <gu...@temetra.com>.
David,

I have been following your stackoverflow posts, I understand what you 
say, we decided to change the criteria and index an extra field (close 
to your suggestion), so the sorting will happen now by polygon area desc 
(Which induced another problem, calculation of polygon area on a 
sphere), finally I got to the point of testing, also due to what you are 
saying, is not a good idea to overload more than just the bare use of 
points (Intersects) inside polygon to get the the list that matches 
specific criteria.

To resume, calculate the "area of the polygon", again, for curved 
polygons is not so obvious, do the standard solr search and sort by that 
extra field, I guess solr overhead will be minimal in that case.

The real use case is for utility industry, let's say users have areas 
where they get meter reads, readings are scheduled and assigned to the 
users that contains such meter GPS location, some users might cover big 
areas and possible to have smaller areas for other users inside such big 
areas, so we changed the distance to center for area covered by, seemed 
simpler and easier.

Thanks your response,

Guido.

On 16/04/13 15:06, Smiley, David W. wrote:
> Guido,
>
> The field type solr.SpatialRecursivePrefixTreeFieldType can only
> participate in distance reporting for indexed points, not other shapes.
> In fact, I recommend not attempting to get the distance if the field isn't
> purely indexed points, as it may get confused if it seems some small
> shapes.  For your use-case, you should index an additional
> solr.SpatialRecursivePrefixTreeFieldType field just for the points.  You
> could do this external to Solr, or you could write a Solr
> UpdateRequestProcessor that parses the shape in order to then call
> getCenter(), and put those points in the other field.
>
> ~ David
>
> On 4/16/13 7:23 AM, "Guido Medina" <gu...@temetra.com> wrote:
>
>> Hi,
>>
>> I got everything in place, my polygons are indexing properly, I played a
>> bit with LSP which helped me a lot, now, I have JTS 1.13 inside
>> solr.war; here is my challenge:
>>
>> I have big polygon (A) which contains smaller polygons (B and C), B and
>> C have some intersection, so if I search for a coordinate inside the 3,
>> I would like to sort by the distance to the centre of the polygons that
>> match the criteria.
>>
>> As example, let's say dot B is on the centre of B, dot C is at the
>> centre of C and dot A is at the intersection of B and C which happens to
>> be the centre of A, so for dot A should be polygon A first and so on. I
>> could compute with the distances using the result but since Solr is
>> doing a heavy load already, why not just include the sort in it.
>>
>> Here is my field type definition:
>>
>>          <!-- Spatial field type -->
>>          <fieldType name="location_rpt"
>> class="solr.SpatialRecursivePrefixTreeFieldType"
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFac
>> tory"
>>                     units="degrees"/>
>>
>>
>> Field definition:
>>
>>          <!-- JTS spatial polygon field -->
>>          <field name="geopolygon" type="location_rpt" indexed="true"
>> stored="false" required="false" multiValued="true"/>
>>
>>
>> I'm using the Solr admin UI first to shape my query and then moving to
>> our web app which uses solrj, here is the XML form of my result which
>> includes the query I'm making, which scores all distances to 1.0 (Not
>> what I want):
>>
>> |<?xml version="1.0" encoding="UTF-8"?>
>> <response>
>>
>> <lst  name="responseHeader">
>>    <int  name="status">0</int>
>>    <int  name="QTime">9</int>
>>    <lst  name="params">
>>      <str  name="fl">id,score</str>
>>      <str  name="sort">score asc</str>
>>      <str  name="indent">true</str>
>>      <str  name="q">*:*</str>
>>      <str  name="_">1366111120720</str>
>>      <str  name="wt">xml</str>
>>      <str  name="fq">{!score=distance}geopolygon:"Intersects(-6.271906
>> 53.379284)"</str>
>>    </lst>
>> </lst>
>> <result  name="response"  numFound="3"  start="0"  maxScore="1.0">
>>    <doc>
>>      <str  name="id">uid13972</str>
>>      <float  name="score">1.0</float></doc>
>>    <doc>
>>      <str  name="id">uid13979</str>
>>      <float  name="score">1.0</float></doc>
>>    <doc>
>>      <str  name="id">uid13974</str>
>>      <float  name="score">1.0</float></doc>
>> </result>
>> </response>|
>>
>>
>> Thanks for all responses,
>>
>> Guido.


Re: Solr 4.2.1 sorting by distance to polygon centre.

Posted by "Smiley, David W." <ds...@mitre.org>.
Guido,

The field type solr.SpatialRecursivePrefixTreeFieldType can only
participate in distance reporting for indexed points, not other shapes.
In fact, I recommend not attempting to get the distance if the field isn't
purely indexed points, as it may get confused if it seems some small
shapes.  For your use-case, you should index an additional
solr.SpatialRecursivePrefixTreeFieldType field just for the points.  You
could do this external to Solr, or you could write a Solr
UpdateRequestProcessor that parses the shape in order to then call
getCenter(), and put those points in the other field.

~ David

On 4/16/13 7:23 AM, "Guido Medina" <gu...@temetra.com> wrote:

>Hi,
>
>I got everything in place, my polygons are indexing properly, I played a
>bit with LSP which helped me a lot, now, I have JTS 1.13 inside
>solr.war; here is my challenge:
>
>I have big polygon (A) which contains smaller polygons (B and C), B and
>C have some intersection, so if I search for a coordinate inside the 3,
>I would like to sort by the distance to the centre of the polygons that
>match the criteria.
>
>As example, let's say dot B is on the centre of B, dot C is at the
>centre of C and dot A is at the intersection of B and C which happens to
>be the centre of A, so for dot A should be polygon A first and so on. I
>could compute with the distances using the result but since Solr is
>doing a heavy load already, why not just include the sort in it.
>
>Here is my field type definition:
>
>         <!-- Spatial field type -->
>         <fieldType name="location_rpt"
>class="solr.SpatialRecursivePrefixTreeFieldType"
>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFac
>tory"
>                    units="degrees"/>
>
>
>Field definition:
>
>         <!-- JTS spatial polygon field -->
>         <field name="geopolygon" type="location_rpt" indexed="true"
>stored="false" required="false" multiValued="true"/>
>
>
>I'm using the Solr admin UI first to shape my query and then moving to
>our web app which uses solrj, here is the XML form of my result which
>includes the query I'm making, which scores all distances to 1.0 (Not
>what I want):
>
>|<?xml version="1.0" encoding="UTF-8"?>
><response>
>
><lst  name="responseHeader">
>   <int  name="status">0</int>
>   <int  name="QTime">9</int>
>   <lst  name="params">
>     <str  name="fl">id,score</str>
>     <str  name="sort">score asc</str>
>     <str  name="indent">true</str>
>     <str  name="q">*:*</str>
>     <str  name="_">1366111120720</str>
>     <str  name="wt">xml</str>
>     <str  name="fq">{!score=distance}geopolygon:"Intersects(-6.271906
>53.379284)"</str>
>   </lst>
></lst>
><result  name="response"  numFound="3"  start="0"  maxScore="1.0">
>   <doc>
>     <str  name="id">uid13972</str>
>     <float  name="score">1.0</float></doc>
>   <doc>
>     <str  name="id">uid13979</str>
>     <float  name="score">1.0</float></doc>
>   <doc>
>     <str  name="id">uid13974</str>
>     <float  name="score">1.0</float></doc>
></result>
></response>|
>
>
>Thanks for all responses,
>
>Guido.