You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ere Maijala <er...@helsinki.fi> on 2014/05/06 10:22:38 UTC

Re: range types in SOLR

David,

I made a note about your mentioning the deprecation below to take it 
into account in our software, but now that I tried to find out more 
about this I ran into some confusion since the Solr documentation 
regarding spatial searches is currently quite badly scattered and partly 
obsolete [1]. I'd appreciate some clarification on what exactly is 
deprecated. We're currently using spatial for both time duration and 
geographic searches, and in the latter we also use e.g. 
Intersects(POLYGON(...)) in addition. Is this also deprecated and if so, 
how should I rewrite it? Thanks!

--Ere

[1] It would be really nice if it was possible to find up to date 
documentation of at least all this in one place:

https://cwiki.apache.org/confluence/display/solr/Spatial+Search
https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
http://wiki.apache.org/solr/SpatialForTimeDurations
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3C1355027722156-4025434.post@n3.nabble.com%3E

3.3.2014 20.12, Smiley, David W. kirjoitti:
> The main reference for this approach is here:
> http://wiki.apache.org/solr/SpatialForTimeDurations
>
>
> Hoss’s illustrations he developed for the meetup presentation are great.
> However, there are bugs in the instruction — specifically it’s important
> to slightly buffer the query and choose an appropriate maxDistErr.  Also,
> it’s more preferable to use the rectangle range query style of spatial
> query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
> “Intersects(minX minY maxX maxY)”.  There’s no technical difference but
> the latter is deprecated and will eventually be removed from Solr 5 /
> trunk.
>
> All this said, recognize this is a bit of a hack (one that works well).
> There is a good chance a more ideal implementation approach is going to be
> developed this year.
>
> ~ David
>
>
> On 3/1/14, 2:54 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>
>> On 3/1/2014 11:41 AM, Thomas Scheffler wrote:
>>> Am 01.03.14 18:24, schrieb Erick Erickson:
>>>> I'm not clear what you're really after here.
>>>>
>>>> Solr certainly supports ranges, things like time:[* TO date_spec] or
>>>> date_field:[date_spec TO date_spec] etc.
>>>>
>>>>
>>>> There's also a really creative use of spatial (of all things) to, say
>>>> answer questions involving multiple dates per record. Imagine, for
>>>> instance, employees with different hours on different days. You can
>>>> use spatial to answer questions like "which employees are available
>>>> on Wednesday between 4PM and 8PM".
>>>>
>>>> And if none of this is relevant, how about you give us some
>>>> use-cases? This could well be an XY problem.
>>>
>>> Hi,
>>>
>>> lets try this example to show the problem. You have some old text that
>>> was written in two periods of time:
>>>
>>> 1.) 2nd half of 13th century: -> 1250-1299
>>> 2.) Beginning of 18th century: -> 1700-1715
>>>
>>> You are searching for text that were written between 1300-1699, than
>>> this document described above should not be hit.
>>>
>>> If you make start date and end date multiple this results in:
>>>
>>> start: [1250, 1700]
>>> end: [1299, 1715]
>>>
>>> A search for documents written between 1300-1699 would be:
>>>
>>> (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300
>>> TO *]) (+start:[*-1699] +end:[1700 TO *])
>>>
>>> You see that the document above would obviously hit by "(+start:[* TO
>>> 1300] +end:[1300 TO *])"
>>
>> This sounds exactly like the spatial use case that Erick just described.
>>
>> http://wiki.apache.org/solr/SpatialForTimeDurations
>> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117
>> /
>>
>> I am not sure whether the following presentation covers time series with
>> spatial, but it does say deep dive.  It's over an hour long, and done by
>> David Smiley, who wrote most of the Spatial code in Solr:
>>
>> http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive
>>
>> Hopefully someone who has actually used this can hop in and give you
>> some additional pointers.
>>
>> Thanks,
>> Shawn
>>
>


-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: range types in SOLR

Posted by Ere Maijala <er...@helsinki.fi>.
David,

thanks, looking forward to LUCENE-5648. I added a comment about 
supporting BC dates. We currently use the spatial support to index date 
ranges with a precision of one day, ranging from year -9999 to 9999.

Just for the record, I had some issues converting bounding box 
Intersects queries to polygons with Solr 4.6.1. Polygon version found 
way more results than it should have. I upgraded to 4.8.0 (and to JTS 
1.13 from 1.12), and now the results are correct.

--Ere

6.5.2014 21.26, david.w.smiley@gmail.com kirjoitti:
> Hi Era,
>
> I appreciate the scattered documentation is confusing for users.  The use
> of spatial for time durations is definitely not an official way to do it;
> it’s clearly a hack/trick — one that works pretty well if you know the
> issues to watch out for.  So I don’t see it getting documented on the
> reference guide.  But, you should be happy to know about this:
> https://issues.apache.org/jira/browse/LUCENE-5648  “Watch” that issue to
> stay abreast of my development on it, and the inevitable Solr FieldType to
> follow, and inevitable documentation in the reference guide.  With luck
> it’ll get in by 4.9.
>
> The “Intersects(POLYGON(…))” syntax is something I suggest using when you
> have to — like when you have a polygon or linestring or if you are indexing
> circles.  One of these days there will be a more Solr friendly query parser
> — definitely for 4.something.  When that happens, it’ll get
> deprecated/removed in trunk/5.
>
> ~ David
>
> On Tue, May 6, 2014 at 4:22 AM, Ere Maijala <er...@helsinki.fi> wrote:
>
>> David,
>>
>> I made a note about your mentioning the deprecation below to take it into
>> account in our software, but now that I tried to find out more about this I
>> ran into some confusion since the Solr documentation regarding spatial
>> searches is currently quite badly scattered and partly obsolete [1]. I'd
>> appreciate some clarification on what exactly is deprecated. We're
>> currently using spatial for both time duration and geographic searches, and
>> in the latter we also use e.g. Intersects(POLYGON(...)) in addition. Is
>> this also deprecated and if so, how should I rewrite it? Thanks!
>>
>> --Ere
>>
>> [1] It would be really nice if it was possible to find up to date
>> documentation of at least all this in one place:
>>
>> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>> https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
>> http://wiki.apache.org/solr/SpatialForTimeDurations
>> https://people.apache.org/~hossman/spatial-for-non-
>> spatial-meetup-20130117/
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/
>> 201212.mbox/%3C1355027722156-4025434.post@n3.nabble.com%3E
>>
>> 3.3.2014 20.12, Smiley, David W. kirjoitti:
>>
>>> The main reference for this approach is here:
>>> http://wiki.apache.org/solr/SpatialForTimeDurations
>>>
>>>
>>> Hoss’s illustrations he developed for the meetup presentation are great.
>>> However, there are bugs in the instruction — specifically it’s important
>>> to slightly buffer the query and choose an appropriate maxDistErr.  Also,
>>> it’s more preferable to use the rectangle range query style of spatial
>>> query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
>>> “Intersects(minX minY maxX maxY)”.  There’s no technical difference but
>>> the latter is deprecated and will eventually be removed from Solr 5 /
>>> trunk.
>>>
>>> All this said, recognize this is a bit of a hack (one that works well).
>>> There is a good chance a more ideal implementation approach is going to be
>>> developed this year.
>>>
>>> ~ David
>>>
>>>
>>> On 3/1/14, 2:54 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>>>
>>>   On 3/1/2014 11:41 AM, Thomas Scheffler wrote:
>>>>
>>>>> Am 01.03.14 18:24, schrieb Erick Erickson:
>>>>>
>>>>>> I'm not clear what you're really after here.
>>>>>>
>>>>>> Solr certainly supports ranges, things like time:[* TO date_spec] or
>>>>>> date_field:[date_spec TO date_spec] etc.
>>>>>>
>>>>>>
>>>>>> There's also a really creative use of spatial (of all things) to, say
>>>>>> answer questions involving multiple dates per record. Imagine, for
>>>>>> instance, employees with different hours on different days. You can
>>>>>> use spatial to answer questions like "which employees are available
>>>>>> on Wednesday between 4PM and 8PM".
>>>>>>
>>>>>> And if none of this is relevant, how about you give us some
>>>>>> use-cases? This could well be an XY problem.
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> lets try this example to show the problem. You have some old text that
>>>>> was written in two periods of time:
>>>>>
>>>>> 1.) 2nd half of 13th century: -> 1250-1299
>>>>> 2.) Beginning of 18th century: -> 1700-1715
>>>>>
>>>>> You are searching for text that were written between 1300-1699, than
>>>>> this document described above should not be hit.
>>>>>
>>>>> If you make start date and end date multiple this results in:
>>>>>
>>>>> start: [1250, 1700]
>>>>> end: [1299, 1715]
>>>>>
>>>>> A search for documents written between 1300-1699 would be:
>>>>>
>>>>> (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300
>>>>> TO *]) (+start:[*-1699] +end:[1700 TO *])
>>>>>
>>>>> You see that the document above would obviously hit by "(+start:[* TO
>>>>> 1300] +end:[1300 TO *])"
>>>>>
>>>>
>>>> This sounds exactly like the spatial use case that Erick just described.
>>>>
>>>> http://wiki.apache.org/solr/SpatialForTimeDurations
>>>> https://people.apache.org/~hossman/spatial-for-non-
>>>> spatial-meetup-20130117
>>>> /
>>>>
>>>> I am not sure whether the following presentation covers time series with
>>>> spatial, but it does say deep dive.  It's over an hour long, and done by
>>>> David Smiley, who wrote most of the Spatial code in Solr:
>>>>
>>>> http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive
>>>>
>>>> Hopefully someone who has actually used this can hop in and give you
>>>> some additional pointers.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>
>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
>


-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: range types in SOLR

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
Hi Era,

I appreciate the scattered documentation is confusing for users.  The use
of spatial for time durations is definitely not an official way to do it;
it’s clearly a hack/trick — one that works pretty well if you know the
issues to watch out for.  So I don’t see it getting documented on the
reference guide.  But, you should be happy to know about this:
https://issues.apache.org/jira/browse/LUCENE-5648  “Watch” that issue to
stay abreast of my development on it, and the inevitable Solr FieldType to
follow, and inevitable documentation in the reference guide.  With luck
it’ll get in by 4.9.

The “Intersects(POLYGON(…))” syntax is something I suggest using when you
have to — like when you have a polygon or linestring or if you are indexing
circles.  One of these days there will be a more Solr friendly query parser
— definitely for 4.something.  When that happens, it’ll get
deprecated/removed in trunk/5.

~ David

On Tue, May 6, 2014 at 4:22 AM, Ere Maijala <er...@helsinki.fi> wrote:

> David,
>
> I made a note about your mentioning the deprecation below to take it into
> account in our software, but now that I tried to find out more about this I
> ran into some confusion since the Solr documentation regarding spatial
> searches is currently quite badly scattered and partly obsolete [1]. I'd
> appreciate some clarification on what exactly is deprecated. We're
> currently using spatial for both time duration and geographic searches, and
> in the latter we also use e.g. Intersects(POLYGON(...)) in addition. Is
> this also deprecated and if so, how should I rewrite it? Thanks!
>
> --Ere
>
> [1] It would be really nice if it was possible to find up to date
> documentation of at least all this in one place:
>
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> http://wiki.apache.org/solr/SpatialForTimeDurations
> https://people.apache.org/~hossman/spatial-for-non-
> spatial-meetup-20130117/
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201212.mbox/%3C1355027722156-4025434.post@n3.nabble.com%3E
>
> 3.3.2014 20.12, Smiley, David W. kirjoitti:
>
>> The main reference for this approach is here:
>> http://wiki.apache.org/solr/SpatialForTimeDurations
>>
>>
>> Hoss’s illustrations he developed for the meetup presentation are great.
>> However, there are bugs in the instruction — specifically it’s important
>> to slightly buffer the query and choose an appropriate maxDistErr.  Also,
>> it’s more preferable to use the rectangle range query style of spatial
>> query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
>> “Intersects(minX minY maxX maxY)”.  There’s no technical difference but
>> the latter is deprecated and will eventually be removed from Solr 5 /
>> trunk.
>>
>> All this said, recognize this is a bit of a hack (one that works well).
>> There is a good chance a more ideal implementation approach is going to be
>> developed this year.
>>
>> ~ David
>>
>>
>> On 3/1/14, 2:54 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>>
>>  On 3/1/2014 11:41 AM, Thomas Scheffler wrote:
>>>
>>>> Am 01.03.14 18:24, schrieb Erick Erickson:
>>>>
>>>>> I'm not clear what you're really after here.
>>>>>
>>>>> Solr certainly supports ranges, things like time:[* TO date_spec] or
>>>>> date_field:[date_spec TO date_spec] etc.
>>>>>
>>>>>
>>>>> There's also a really creative use of spatial (of all things) to, say
>>>>> answer questions involving multiple dates per record. Imagine, for
>>>>> instance, employees with different hours on different days. You can
>>>>> use spatial to answer questions like "which employees are available
>>>>> on Wednesday between 4PM and 8PM".
>>>>>
>>>>> And if none of this is relevant, how about you give us some
>>>>> use-cases? This could well be an XY problem.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> lets try this example to show the problem. You have some old text that
>>>> was written in two periods of time:
>>>>
>>>> 1.) 2nd half of 13th century: -> 1250-1299
>>>> 2.) Beginning of 18th century: -> 1700-1715
>>>>
>>>> You are searching for text that were written between 1300-1699, than
>>>> this document described above should not be hit.
>>>>
>>>> If you make start date and end date multiple this results in:
>>>>
>>>> start: [1250, 1700]
>>>> end: [1299, 1715]
>>>>
>>>> A search for documents written between 1300-1699 would be:
>>>>
>>>> (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300
>>>> TO *]) (+start:[*-1699] +end:[1700 TO *])
>>>>
>>>> You see that the document above would obviously hit by "(+start:[* TO
>>>> 1300] +end:[1300 TO *])"
>>>>
>>>
>>> This sounds exactly like the spatial use case that Erick just described.
>>>
>>> http://wiki.apache.org/solr/SpatialForTimeDurations
>>> https://people.apache.org/~hossman/spatial-for-non-
>>> spatial-meetup-20130117
>>> /
>>>
>>> I am not sure whether the following presentation covers time series with
>>> spatial, but it does say deep dive.  It's over an hour long, and done by
>>> David Smiley, who wrote most of the Spatial code in Solr:
>>>
>>> http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive
>>>
>>> Hopefully someone who has actually used this can hop in and give you
>>> some additional pointers.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>