You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Leila Deljkovic <le...@koordinates.com> on 2018/01/31 20:33:48 UTC

Sorting results for spatial search

Hiya,

So I have some nested documents in my index with this kind of structure:
{	
    "id": “parent",
    "gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
    "density": “30"

        "_childDocuments_" : [
        {
            "id":"child1",
            "gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
            "density":"25"
        },
        {
            "id":"child2",
            "gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15 5)))",
            "density":"5"
        }
        ]
}

The parent document is a WKT shape, and its children are “grid cells”, which are just divisions of the main shape (ie; cutting up the parent shape to get children shapes). The “density" is the feature count in each shape. When I query (through the Solr UI) I use “Intersects” to return parents which touch the search area (note that if a child is touching, the parent must also be touching).

	eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80, -20 20, 30 60, -10 40, -20 70)))

and I want to sort the results by the sum of the densities of all the children touching the search area (so which parent has children that touch the search area, and how big the sum of these children’s densities is)
	something like {!parent which=is_parent:true score=total v='+is_parent:false +{!func}density'} desc

The problem is that this includes children that DON’T touch the search area in the sum. How can I only include the shapes from the first query above in my sort?

Cheers :)

Re: Sorting results for spatial search

Posted by Leila Deljkovic <le...@koordinates.com>.
Hey David,

Thanks for your suggestions! I think I’ve got the right behaviour now; I’ve done fq={!parent which=is_parent:true score=total v='+is_parent:false +{!func}density'} desc instead of sort=…

Side note: the grid cells can be POLYGON or MULTIPOLYGON, so BBoxField didn’t work when I tried it, so had to resort to RptWithGeometrySpatialField. Not sure how it will do performance wise yet.



> On 2/02/2018, at 5:24 AM, David Smiley <da...@gmail.com> wrote:
> 
> quote: "The problem is that this includes children that DON’T touch the
> search area in the sum. How can I only include the shapes from the first
> query above in my sort?"
> 
> Unless I'm misunderstanding your intent, I think this is a simple matter of
> adding the spatial filter to the parent join query you are sorting on.  So
> something like this (not tested):
> 
> &sort=query($sortQ) desc
> &sortQ={!parent which=is_parent:true score=total}
>  +is_parent:false
>  +{!func}density
>  +gridcell_rpt:"Intersects(POLYGON((-20 70, -50 80, -20 20, 30 60, -10 40,
> -20 70)))"
> 
> Separately from your question, you state that these are grid cells and thus
> rectangles.  For rectangles, I recommend using BBoxField, which will
> probably overall perform better (smaller index, faster queries).  If you
> need an RPT field nonetheless (heatmaps?) then you could use the more
> concise ENVELOPE syntax but it shouldn't matter since a polygon that is a
> rectangle will internally be optimized to be one.
> 
> On Wed, Jan 31, 2018 at 3:33 PM Leila Deljkovic <
> leila.deljkovic@koordinates.com> wrote:
> 
>> Hiya,
>> 
>> So I have some nested documents in my index with this kind of structure:
>> {
>>    "id": “parent",
>>    "gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
>>    "density": “30"
>> 
>>        "_childDocuments_" : [
>>        {
>>            "id":"child1",
>>            "gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
>>            "density":"25"
>>        },
>>        {
>>            "id":"child2",
>>            "gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15
>> 5)))",
>>            "density":"5"
>>        }
>>        ]
>> }
>> 
>> The parent document is a WKT shape, and its children are “grid cells”,
>> which are just divisions of the main shape (ie; cutting up the parent shape
>> to get children shapes). The “density" is the feature count in each shape.
>> When I query (through the Solr UI) I use “Intersects” to return parents
>> which touch the search area (note that if a child is touching, the parent
>> must also be touching).
>> 
>>        eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80,
>> -20 20, 30 60, -10 40, -20 70)))
>> 
>> and I want to sort the results by the sum of the densities of all the
>> children touching the search area (so which parent has children that touch
>> the search area, and how big the sum of these children’s densities is)
>>        something like {!parent which=is_parent:true score=total
>> v='+is_parent:false +{!func}density'} desc
>> 
>> The problem is that this includes children that DON’T touch the search
>> area in the sum. How can I only include the shapes from the first query
>> above in my sort?
>> 
>> Cheers :)
> 
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com


Re: Sorting results for spatial search

Posted by David Smiley <da...@gmail.com>.
quote: "The problem is that this includes children that DON’T touch the
search area in the sum. How can I only include the shapes from the first
query above in my sort?"

Unless I'm misunderstanding your intent, I think this is a simple matter of
adding the spatial filter to the parent join query you are sorting on.  So
something like this (not tested):

&sort=query($sortQ) desc
&sortQ={!parent which=is_parent:true score=total}
  +is_parent:false
  +{!func}density
  +gridcell_rpt:"Intersects(POLYGON((-20 70, -50 80, -20 20, 30 60, -10 40,
-20 70)))"

Separately from your question, you state that these are grid cells and thus
rectangles.  For rectangles, I recommend using BBoxField, which will
probably overall perform better (smaller index, faster queries).  If you
need an RPT field nonetheless (heatmaps?) then you could use the more
concise ENVELOPE syntax but it shouldn't matter since a polygon that is a
rectangle will internally be optimized to be one.

On Wed, Jan 31, 2018 at 3:33 PM Leila Deljkovic <
leila.deljkovic@koordinates.com> wrote:

> Hiya,
>
> So I have some nested documents in my index with this kind of structure:
> {
>     "id": “parent",
>     "gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
>     "density": “30"
>
>         "_childDocuments_" : [
>         {
>             "id":"child1",
>             "gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
>             "density":"25"
>         },
>         {
>             "id":"child2",
>             "gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15
> 5)))",
>             "density":"5"
>         }
>         ]
> }
>
> The parent document is a WKT shape, and its children are “grid cells”,
> which are just divisions of the main shape (ie; cutting up the parent shape
> to get children shapes). The “density" is the feature count in each shape.
> When I query (through the Solr UI) I use “Intersects” to return parents
> which touch the search area (note that if a child is touching, the parent
> must also be touching).
>
>         eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80,
> -20 20, 30 60, -10 40, -20 70)))
>
> and I want to sort the results by the sum of the densities of all the
> children touching the search area (so which parent has children that touch
> the search area, and how big the sum of these children’s densities is)
>         something like {!parent which=is_parent:true score=total
> v='+is_parent:false +{!func}density'} desc
>
> The problem is that this includes children that DON’T touch the search
> area in the sum. How can I only include the shapes from the first query
> above in my sort?
>
> Cheers :)

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com