You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ramzesua <mi...@gmail.com> on 2011/01/27 11:15:45 UTC

query range in multivalued date field

hi all. My query range for multivalued date field work incorrect.
My schema. There is field "requestDate" that have multivalued attr.:
<fields>   
   <field name="id" type="string" indexed="true" stored="true"
required="true" />
   <field name="keyword" type="text" indexed="true" stored="true" />
   <field name="count"  type="float" indexed="true" stored="true" />
   <field name="isResult"  type="int" indexed="true" stored="true"
default="0" multiValued="true" />
   <field name="requestDate"  type="date" indexed="true" stored="true"
multiValued="true" />
 </fields>

Some data from the index:

<doc> 
  <float name="count">2.0</float> 
  <str name="id">sale</str> 
  <arr name="isResult"><int>1</int><int>1</int></arr> 
  <str name="keyword">sale</str> 
  <arr
name="requestDate"><date>2011-01-26T08:18:35Z</date><date>2011-01-27T01:31:28Z</date></arr> 
 </doc> 
 <doc> 
  <float name="count">3.0</float> 
  <str name="id">coldpop</str> 
  <arr name="isResult"><int>1</int><int>1</int><int>1</int></arr> 
  <str name="keyword">cold pop</str> 
  <arr
name="requestDate"><date>2011-01-27T01:30:01Z</date><date>2011-01-27T01:32:01Z</date><date>2011-01-27T01:32:18Z</date></arr> 
 </doc> 

I try to search some docs where date is in some range, for example,
http://localhost:8983/request/select?q=requestDate:[NOW/HOUR-1HOUR TO
NOW/HOUR]
There are no result. After some analyzing, I saw, that this range works only
for first item in the requestDate field, but don't filtered for another
items. Where is my mistake? Or SOLR can't filtered multivalued date fields?
Thanks
-- 
View this message in context: http://lucene.472066.n3.nabble.com/query-range-in-multivalued-date-field-tp2361292p2361292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: query range in multivalued date field

Posted by Erick Erickson <er...@gmail.com>.
Range queries work on multivalued fields. I suspect the date math
conversion is fooling you. For instance,NOW/HOUR first rounds down to
the current hour, *then* subtracts one hour.

If you attach &debugQuery=on (or check the debug checkbox
in the admin full search page), you'll see the exact results of
the conversion, that may help.

Best
Erick

On Thu, Jan 27, 2011 at 5:15 AM, ramzesua <mi...@gmail.com> wrote:

>
> hi all. My query range for multivalued date field work incorrect.
> My schema. There is field "requestDate" that have multivalued attr.:
> <fields>
>   <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>   <field name="keyword" type="text" indexed="true" stored="true" />
>   <field name="count"  type="float" indexed="true" stored="true" />
>   <field name="isResult"  type="int" indexed="true" stored="true"
> default="0" multiValued="true" />
>   <field name="requestDate"  type="date" indexed="true" stored="true"
> multiValued="true" />
>  </fields>
>
> Some data from the index:
>
> <doc>
>  <float name="count">2.0</float>
>  <str name="id">sale</str>
>  <arr name="isResult"><int>1</int><int>1</int></arr>
>  <str name="keyword">sale</str>
>  <arr
>
> name="requestDate"><date>2011-01-26T08:18:35Z</date><date>2011-01-27T01:31:28Z</date></arr>
>  </doc>
>  <doc>
>  <float name="count">3.0</float>
>  <str name="id">coldpop</str>
>  <arr name="isResult"><int>1</int><int>1</int><int>1</int></arr>
>  <str name="keyword">cold pop</str>
>  <arr
>
> name="requestDate"><date>2011-01-27T01:30:01Z</date><date>2011-01-27T01:32:01Z</date><date>2011-01-27T01:32:18Z</date></arr>
>  </doc>
>
> I try to search some docs where date is in some range, for example,
> http://localhost:8983/request/select?q=requestDate:[NOW/HOUR-1HOUR TO
> NOW/HOUR]
> There are no result. After some analyzing, I saw, that this range works
> only
> for first item in the requestDate field, but don't filtered for another
> items. Where is my mistake? Or SOLR can't filtered multivalued date fields?
> Thanks
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/query-range-in-multivalued-date-field-tp2361292p2361292.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Import Handler for tokenizing facet string into multi-valued
:     solr.StrField.. 
: In-Reply-To: <12...@n3.nabble.com>
: References: <12...@n3.nabble.com>


-Hoss

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

Posted by Erick Erickson <er...@gmail.com>.
Tokenization is fine with facets, that caution is about, say, faceting
on the tokenized body of a document where you have potentially
a huge number of unique tokens.

But if there is a controlled number of distinct values, you shouldn't have
to do anything except index to a tokenized field. I'd remove stemming,
WordDelimiterFactory, etc though, in fact I'd probably just go with
WhiteSpaceTokenizer and, maybe, LowerCaseFilter.

But if you have a huge number of unique values, it doesn't matter whether
they are tokenized or strings, it'll still be a problem.

One note: when faceting for the first time on a newly-started Solr instance,
the caches are filled and the *first* query will be slower, so measure
subsequent queries.

Best
Erick

On Thu, Jan 27, 2011 at 9:09 AM, Dennis Schafroth <de...@indexdata.com>wrote:

> Hi,
>
> Pretty novice into SOLR coding, but looking for hints about how (if not
> already done) to implement a PatternTokenizer, that would index this into
> multivalie fields of solr.StrField for facetting. Ex.
>
> Water -- Irrigation ; Water -- Sewage
>
> should be tokenized into
>
> Water
> Irrigation
> Sewage
>
> in multi-valued non-tokenized fields due to performance. I could do it from
> the outside, but I would this as a opportunity to learn about SOLR.
>
> It "works" as I want with the PatternTokenizerFactory when I am using
> solr.TextField, but not when I am using the non-tokenized solr.StrField. But
> according to reading, facets performance is better on non-tokenized fields.
> We need better performance on our faceted searches on these multi-value
> fields.  (25 million documents, three multi-valued facets)
>
> I would also need to have a filter that filter out identical values as the
> feeds have redundant data as shown above.
>
> Can anyone point point me in the right direction..
>
> cheers,
> :-Dennis

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

Posted by Dennis Schafroth <de...@indexdata.com>.
Thanks for the hints! 

Sorry about stealing the thread "query range in multivalued date field" Mistakenly responded to it. 

cheers,
:-Dennis 

On 27/01/2011, at 16.48, Erik Hatcher wrote:

> Beyond what Erick said, I'll add that it is often better to "do this from the outside" and send in multiple actual end-user displayable facet values.  When you send in a field like "Water -- Irrigation ; Water -- Sewage", that is what will get stored (if you have it set to stored), but what you might rather want is each individual value stored, which can only be done by the indexer sending in multiple values, not through just tokenization.
> 
> 	Erik
> 
> On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote:
> 
>> Hi, 
>> 
>> Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. 
>> 
>> Water -- Irrigation ; Water -- Sewage
>> 
>> should be tokenized into 
>> 
>> Water
>> Irrigation
>> Sewage
>> 
>> in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR.
>> 
>> It "works" as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields.  (25 million documents, three multi-valued facets)
>> 
>> I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above.
>> 
>> Can anyone point point me in the right direction..
>> 
>> cheers, 
>> :-Dennis
> 
> 


Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

Posted by Erik Hatcher <er...@gmail.com>.
Beyond what Erick said, I'll add that it is often better to "do this from the outside" and send in multiple actual end-user displayable facet values.  When you send in a field like "Water -- Irrigation ; Water -- Sewage", that is what will get stored (if you have it set to stored), but what you might rather want is each individual value stored, which can only be done by the indexer sending in multiple values, not through just tokenization.

	Erik

On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote:

> Hi, 
> 
> Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. 
> 
> Water -- Irrigation ; Water -- Sewage
> 
> should be tokenized into 
> 
> Water
> Irrigation
> Sewage
> 
> in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR.
> 
> It "works" as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields.  (25 million documents, three multi-valued facets)
> 
> I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above.
> 
> Can anyone point point me in the right direction..
> 
> cheers, 
> :-Dennis


Import Handler for tokenizing facet string into multi-valued solr.StrField..

Posted by Dennis Schafroth <de...@indexdata.com>.
Hi, 

Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. 

Water -- Irrigation ; Water -- Sewage

should be tokenized into 

Water
Irrigation
Sewage

in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR.

It "works" as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields.  (25 million documents, three multi-valued facets)

I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above.

Can anyone point point me in the right direction..

cheers, 
:-Dennis