You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Darniz <rn...@edmunds.com> on 2013/09/27 01:52:52 UTC

Doing time sensitive search in solr

hello Users,

i have a requirement where my content should be search based upon time. For
example below is our content in our cms.
<entry start-date=1-sept-2013>
    Sept content : Honda is releasing the car this month
<entry>

<entry start-date=1-dec-2013>
    Dec content : Toyota is releasing the car this month
<entry>

On the website based upon time we display the content. On the solr side,
until now we were indexing all entries element in Solr in text field. Now
after we introduced time sensitive information in our cms, i need to know if
someone queries for word "Toyota" it should NOT come up in my search results
since that content is going live in dec.  

The solr text field looks something like
<arr name="text">
    <str>Honda is releasing the car this month</str>
    <str>Toyota is releasing this month</str>
</arr>

is there a way we can search the text field or append any meta data to the
text field based on date.

i hope i have made the issue clear. i kind of don't agree with this kind of
practice but our requirement is pretty peculiar since we don't want to
reindex data again and again.




--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Darniz,

Just put the date in a separate field and add a range query on that field
to your existing query.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 26, 2013 7:53 PM, "Darniz" <rn...@edmunds.com> wrote:

> hello Users,
>
> i have a requirement where my content should be search based upon time. For
> example below is our content in our cms.
> <entry start-date=1-sept-2013>
>     Sept content : Honda is releasing the car this month
> <entry>
>
> <entry start-date=1-dec-2013>
>     Dec content : Toyota is releasing the car this month
> <entry>
>
> On the website based upon time we display the content. On the solr side,
> until now we were indexing all entries element in Solr in text field. Now
> after we introduced time sensitive information in our cms, i need to know
> if
> someone queries for word "Toyota" it should NOT come up in my search
> results
> since that content is going live in dec.
>
> The solr text field looks something like
> <arr name="text">
>     <str>Honda is releasing the car this month</str>
>     <str>Toyota is releasing this month</str>
> </arr>
>
> is there a way we can search the text field or append any meta data to the
> text field based on date.
>
> i hope i have made the issue clear. i kind of don't agree with this kind of
> practice but our requirement is pretty peculiar since we don't want to
> reindex data again and again.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Doing time sensitive search in solr

Posted by Erick Erickson <er...@gmail.com>.
When you specify a sort parameter it totally overrides the
scoring. You can specify multiple sort criteria, e.g. both
live_dt and score. If you specify two sort criteria, any
ties in the first are broken by the second and so on through
as many sort criteria as you have specified.

Note that specifying secondary etc. sort criteria if the first
one is unlikely to result in a tie is pretty useless. I.e.t if
sorting by millisecond dates there will (probably) be very
few ties.

Best,
Erick


On Mon, Oct 14, 2013 at 1:30 AM, Darniz <rn...@edmunds.com> wrote:

> Thanks eric
> i think thats the way to go
>
> This leads to one more question. since now that i have two doc with the
> same
> content path i want to get distinct content path with max date.
>
> <doc>
>     <field name="id">1</field>
>     <field name="contentPath">/editorial-updates</field>
>     <field name="live_dt>2013-09-01T00:00:00Z</field>
>     <field name="text">Sept content : Honda is releasing the car this
> month</field>
> </doc>
> <doc>
>     <field name="id">2</field>
>     <field name="contentPath">/editorial-updates</field>
>     <field name="live_dt>2013-10-01T00:00:00Z</field>
>     <field name="text">Oct content : Honda is releasing the car this
> month</field>
> </doc>
>
> For example if a user searches for car
> text:car AND live_dt:[* TO NOW]
>  then both the doc are returned. i want the max latest doc to come in the
> above case id=2 and the other document should not come,
>
> Just to add, i use dismax handler where we have boosting on specific fields
> and till now doc where returned by natural scoring order by dismax handler.
> if i add &sort=live_dt desc does it order doc purely by live_dt or also
> respect relevancy. since also have some other doc which dont have live_dt.
>
>
> any thoughts
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4095321.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Thanks eric
i think thats the way to go

This leads to one more question. since now that i have two doc with the same
content path i want to get distinct content path with max date.

<doc>
    <field name="id">1</field>
    <field name="contentPath">/editorial-updates</field>
    <field name="live_dt>2013-09-01T00:00:00Z</field>
    <field name="text">Sept content : Honda is releasing the car this
month</field>
</doc>
<doc>
    <field name="id">2</field>
    <field name="contentPath">/editorial-updates</field>
    <field name="live_dt>2013-10-01T00:00:00Z</field>
    <field name="text">Oct content : Honda is releasing the car this
month</field>
</doc>

For example if a user searches for car
text:car AND live_dt:[* TO NOW]
 then both the doc are returned. i want the max latest doc to come in the
above case id=2 and the other document should not come, 

Just to add, i use dismax handler where we have boosting on specific fields
and till now doc where returned by natural scoring order by dismax handler. 
if i add &sort=live_dt desc does it order doc purely by live_dt or also
respect relevancy. since also have some other doc which dont have live_dt.


any thoughts




--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4095321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Erick Erickson <er...@gmail.com>.
I'd index them as separate documents.

Best,
Erick

On Mon, Oct 7, 2013 at 2:59 PM, Darniz <rn...@edmunds.com> wrote:
> Thanks Eric
>
> Ok if we go by that proposal of copying all date fields into on bag_of_dates
> field
>
> Hence now we have a field and it will look something like this.
> <arr name="bag_of_dates">
>       <str>2013-09-01T00:00:00Z</str>
>       <str>2013-12-01T00:00:00Z</str>
> </arr>
> <arr name="text">
>       <str>Sept content : Honda is releasing the car this month</str>
>       <str>Dec content : Toyota is releasing the car this month </str>
> </arr>
> and i also agree now we can make a range query where bag_of_dates:[* TO NOW]
> AND text:Toyota but still how are we going to make sure the document should
> not get returned since toyota is only searchable from 1-DEC-2013
>
> i hope i am able to explain it properly
>
> ON our website, when we render data we dont show this line "Dec content :
> Toyota is releasing the car this month" on the page since todays date is not
> 1-DEC-2013 yet. hence we dont want this doc to be shown in search result as
> well when we query solr
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Thanks Eric 

Ok if we go by that proposal of copying all date fields into on bag_of_dates
field

Hence now we have a field and it will look something like this.
<arr name="bag_of_dates">
      <str>2013-09-01T00:00:00Z</str>
      <str>2013-12-01T00:00:00Z</str>
</arr>
<arr name="text">
      <str>Sept content : Honda is releasing the car this month</str>
      <str>Dec content : Toyota is releasing the car this month </str>
</arr>
and i also agree now we can make a range query where bag_of_dates:[* TO NOW]
AND text:Toyota but still how are we going to make sure the document should
not get returned since toyota is only searchable from 1-DEC-2013

i hope i am able to explain it properly

ON our website, when we render data we dont show this line "Dec content :
Toyota is releasing the car this month" on the page since todays date is not
1-DEC-2013 yet. hence we dont want this doc to be shown in search result as
well when we query solr



--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Erick Erickson <er...@gmail.com>.
Wait, are you saying you have fields like
2013-12-01T00:00:00Z_entryDate? So
you have some wildcard definition in your
schema like
*_entryDate type="tdate"?
If so, I think your model is just wrong and you should
have some field(s) that you store dates in.

That aside, and assuming you have wildcards like
I'm guessing, you could have a copyfield to
like
source="*_entryDate" dest="bag_of_dates"
and do your ranges on "bag_of_dates".

Which would be the same as putting your dates
in a single field with a fixed name in the first place.

Best,
Erick

On Sun, Oct 6, 2013 at 4:34 PM, Darniz <rn...@edmunds.com> wrote:
> Thanks Eric.
>
> i hope i understood correctly, but my main concern is i have to tie specific
> content indexed to a specific time range. and make that document come up in
> search results only for that time. As i have mentioned in my previous
> example we have multiple data-string structures which makes a bit more
> complicated, on top of that i don't know what will be the exact date. Hence
> if someone searches for toyota and if today is 6-OCT-2013 this doc should
> not come in search results since the keyword toyota should be searched only
> after 1-DEC-2013.
>
> <date name="2013-09-01T00:00:00Z_entryDate">2013-09-01T00:00:00Z</date>
> <str name="2013-09-01T0:00:00Z_entryText">Sept content : Honda is releasing
> the car this month </str>
>
> <date name="2013-12-01T00:00:00Z_entryDate">2013-12-01T00:00:00Z</date>
> <str name="2013-12-01T00:00:00Z_entryText">Dec content : Toyota is releasing
> the car this month </str>
>
> i dont know using a copy field might solve this or correct me if i am wrong.
>
> may be we are pursuing something which is not meant for Solr.
>
> Thanks
> Rashid
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093790.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Thanks Eric.

i hope i understood correctly, but my main concern is i have to tie specific
content indexed to a specific time range. and make that document come up in
search results only for that time. As i have mentioned in my previous
example we have multiple data-string structures which makes a bit more
complicated, on top of that i don't know what will be the exact date. Hence
if someone searches for toyota and if today is 6-OCT-2013 this doc should
not come in search results since the keyword toyota should be searched only
after 1-DEC-2013.

<date name="2013-09-01T00:00:00Z_entryDate">2013-09-01T00:00:00Z</date>
<str name="2013-09-01T0:00:00Z_entryText">Sept content : Honda is releasing
the car this month </str>

<date name="2013-12-01T00:00:00Z_entryDate">2013-12-01T00:00:00Z</date>
<str name="2013-12-01T00:00:00Z_entryText">Dec content : Toyota is releasing
the car this month </str>

i dont know using a copy field might solve this or correct me if i am wrong.

may be we are pursuing something which is not meant for Solr.

Thanks
Rashid




--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Erick Erickson <er...@gmail.com>.
the copyField directive handles glob patterns as the source, did you try that?

Best,
Erick

On Thu, Oct 3, 2013 at 10:49 PM, Darniz <rn...@edmunds.com> wrote:
> i am assuming there is no solution or i have to handle it at index time.
>
> Any solr experts please
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093414.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
i am assuming there is no solution or i have to handle it at index time.

Any solr experts please



--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093414.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Thanks Eric 
When i did solr in 2010 i thought now they might have evolved and allow
doing query by providing wildcard in field name, but looks like i have to
provide a concrete dynamic field name to query.

Anyway will look in the catch all fields.

Do you have any examples on how a catch all fields will help with this, or
how my doc will look like and how can i query. 

darniz



--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092989.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Erick Erickson <er...@gmail.com>.
Try it and see :).

Dynamic fields are just like regular fields once you index a document
that uses one. After that, they should behave just like regular.

If you're asking if you can create a query like *_txt:text meaning
search all the fields that end with _txt for the word "text", I don't
think so. An alternative is to copy all the fields into a catch-all
field...

Best,
Erick

On Mon, Sep 30, 2013 at 3:41 PM, Darniz <rn...@edmunds.com> wrote:
> Hello
> i just wanted to make sure can we query dynamic fields using wildcard well
> if not then i dont think this solution might work, since i dont know the
> exact concrete name of the field.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Hello 
i just wanted to make sure can we query dynamic fields using wildcard well
if not then i dont think this solution might work, since i dont know the
exact concrete name of the field.





--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Darniz <rn...@edmunds.com>.
Thanks for the quick answers.
i have gone thru the presentation and thats what i was tilting towards using
dynamic fields i just want to run down an example so thats its clear about
how to approach this issue. 
<entry start-date=1-sept-2013>
    Sept content : Honda is releasing the car this month 
<entry>
<entry start-date=1-dec-2013>
    Dec content : Toyota is releasing the car this month 
<entry>
After adding dynamic fields like *_entryDate and *_entryText my solr doc
will look something like this.

<date name="2013-09-01T00:00:00Z_entryDate">2013-09-01T00:00:00Z</date>
<str name="2013-09-01T0:00:00Z_entryText">Sept content : Honda is releasing
the car this month </str>

<date name="2013-12-01T00:00:00Z_entryDate">2013-12-01T00:00:00Z</date>
<str name="2013-12-01T00:00:00Z_entryText">Dec content : Toyota is releasing
the car this month </str>

if someone searches for a query something like
*_entryDate:[* TO NOW] AND *_entryText:Toyota the results wont show up
toyota in the search results.

the only disadvantage we have with this approach is we might end up with a
lot of runtime fields since we have thousands of entries which might be time
bound in our cms. 
i might also do some more investigation to see if we can handle this at
index time to index data as time comes some scheduler of something, because
the above approach might solve the issue but may make the queries very slow.


Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
If your different strings have different semantics (date, etc), you may
need to split your entries based on that semantics.

Either have the 'entity' represent one 'string-date' structure or have
additional field that represents content searchable during that specific
period and only have one with all the strings as stored (if you absolutely
need it).

Search for Gilt's presentation on Solr, they deal with some of the similar
issues (flash sales).

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 27, 2013 at 6:52 AM, Darniz <rn...@edmunds.com> wrote:

> hello Users,
>
> i have a requirement where my content should be search based upon time. For
> example below is our content in our cms.
> <entry start-date=1-sept-2013>
>     Sept content : Honda is releasing the car this month
> <entry>
>
> <entry start-date=1-dec-2013>
>     Dec content : Toyota is releasing the car this month
> <entry>
>
> On the website based upon time we display the content. On the solr side,
> until now we were indexing all entries element in Solr in text field. Now
> after we introduced time sensitive information in our cms, i need to know
> if
> someone queries for word "Toyota" it should NOT come up in my search
> results
> since that content is going live in dec.
>
> The solr text field looks something like
> <arr name="text">
>     <str>Honda is releasing the car this month</str>
>     <str>Toyota is releasing this month</str>
> </arr>
>
> is there a way we can search the text field or append any meta data to the
> text field based on date.
>
> i hope i have made the issue clear. i kind of don't agree with this kind of
> practice but our requirement is pretty peculiar since we don't want to
> reindex data again and again.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>