You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matt Mitchell <go...@gmail.com> on 2010/10/08 04:53:16 UTC

case-insensitive phrase query for string fields

What's the recommended approach for handling case-insensitive phrase
queries? I've got this setup, but no luck:

<fieldType name="ci_string" class="solr.StrField">
      <analyzer>
         <filter class="solr.LowerCaseFilterFactory"/>
         <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>
</fieldType>

So if I index a doc with a title of "Golden Master", then I'd expect a
query of q=title:"golden master" to work, but no go...

I know I must be missing something super obvious!

Matt

Re: case-insensitive phrase query for string fields

Posted by Matt Mitchell <go...@gmail.com>.
Hey thanks guys! This all makes sense now. I'm using a text field and
it's giving good results of course.

Matt

On Fri, Oct 8, 2010 at 6:08 AM, Erik Hatcher <er...@gmail.com> wrote:
> Matt - <https://issues.apache.org/jira/browse/SOLR-2145>
>
>        Erik
>
>
> On Oct 7, 2010, at 23:38 , Jonathan Rochkind wrote:
>
>> If you are going to put explict phrase quotes in the query string like that, an ordinary text field will match fine, on phrase searches or other searches. That is a solr.TextField, not a solr.StrField as you're using. And then you can put a LowerCaseFilter on it of course. And use an ordinary tokenizer, whitespace or worddelimiter or what have you, not the non-tokenizing keywordtokenizer. Just an ordinary solr.TextField.
>>
>> I've never been entirely sure what an indexed solr.StrField is good for exactly. Oh, facets, right. But it's not generally good for matching in an actual 'q', because it's not a tokenized field. Not sure what happens telling a StrField that isn't ever tokenized to use a KeywordTokenizerFactory, maybe it just ignores it, or maybe that's part of the problem.
>>
>> If you mean you only want it to match on _exact_ matches (rather than phrase matches), I haven't quite figured out how to do that, in a dismax query where you only want one field of many to behave that way.  But for a single field query (in an fq, or as the only field in a standard query parser q), the "field" defType will do it. Although now I'm wondering if there is a way to trick a StrField into doing that.
>> ________________________________________
>> From: Matt Mitchell [goodieboy@gmail.com]
>> Sent: Thursday, October 07, 2010 10:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: case-insensitive phrase query for string fields
>>
>> What's the recommended approach for handling case-insensitive phrase
>> queries? I've got this setup, but no luck:
>>
>> <fieldType name="ci_string" class="solr.StrField">
>>      <analyzer>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>      </analyzer>
>> </fieldType>
>>
>> So if I index a doc with a title of "Golden Master", then I'd expect a
>> query of q=title:"golden master" to work, but no go...
>>
>> I know I must be missing something super obvious!
>>
>> Matt
>
>

Re: case-insensitive phrase query for string fields

Posted by Erik Hatcher <er...@gmail.com>.
Matt - <https://issues.apache.org/jira/browse/SOLR-2145>

	Erik


On Oct 7, 2010, at 23:38 , Jonathan Rochkind wrote:

> If you are going to put explict phrase quotes in the query string like that, an ordinary text field will match fine, on phrase searches or other searches. That is a solr.TextField, not a solr.StrField as you're using. And then you can put a LowerCaseFilter on it of course. And use an ordinary tokenizer, whitespace or worddelimiter or what have you, not the non-tokenizing keywordtokenizer. Just an ordinary solr.TextField. 
> 
> I've never been entirely sure what an indexed solr.StrField is good for exactly. Oh, facets, right. But it's not generally good for matching in an actual 'q', because it's not a tokenized field. Not sure what happens telling a StrField that isn't ever tokenized to use a KeywordTokenizerFactory, maybe it just ignores it, or maybe that's part of the problem. 
> 
> If you mean you only want it to match on _exact_ matches (rather than phrase matches), I haven't quite figured out how to do that, in a dismax query where you only want one field of many to behave that way.  But for a single field query (in an fq, or as the only field in a standard query parser q), the "field" defType will do it. Although now I'm wondering if there is a way to trick a StrField into doing that. 
> ________________________________________
> From: Matt Mitchell [goodieboy@gmail.com]
> Sent: Thursday, October 07, 2010 10:53 PM
> To: solr-user@lucene.apache.org
> Subject: case-insensitive phrase query for string fields
> 
> What's the recommended approach for handling case-insensitive phrase
> queries? I've got this setup, but no luck:
> 
> <fieldType name="ci_string" class="solr.StrField">
>      <analyzer>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>      </analyzer>
> </fieldType>
> 
> So if I index a doc with a title of "Golden Master", then I'd expect a
> query of q=title:"golden master" to work, but no go...
> 
> I know I must be missing something super obvious!
> 
> Matt


RE: case-insensitive phrase query for string fields

Posted by Jonathan Rochkind <ro...@jhu.edu>.
If you are going to put explict phrase quotes in the query string like that, an ordinary text field will match fine, on phrase searches or other searches. That is a solr.TextField, not a solr.StrField as you're using. And then you can put a LowerCaseFilter on it of course. And use an ordinary tokenizer, whitespace or worddelimiter or what have you, not the non-tokenizing keywordtokenizer. Just an ordinary solr.TextField. 

I've never been entirely sure what an indexed solr.StrField is good for exactly. Oh, facets, right. But it's not generally good for matching in an actual 'q', because it's not a tokenized field. Not sure what happens telling a StrField that isn't ever tokenized to use a KeywordTokenizerFactory, maybe it just ignores it, or maybe that's part of the problem. 

If you mean you only want it to match on _exact_ matches (rather than phrase matches), I haven't quite figured out how to do that, in a dismax query where you only want one field of many to behave that way.  But for a single field query (in an fq, or as the only field in a standard query parser q), the "field" defType will do it. Although now I'm wondering if there is a way to trick a StrField into doing that. 
________________________________________
From: Matt Mitchell [goodieboy@gmail.com]
Sent: Thursday, October 07, 2010 10:53 PM
To: solr-user@lucene.apache.org
Subject: case-insensitive phrase query for string fields

What's the recommended approach for handling case-insensitive phrase
queries? I've got this setup, but no luck:

<fieldType name="ci_string" class="solr.StrField">
      <analyzer>
         <filter class="solr.LowerCaseFilterFactory"/>
         <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>
</fieldType>

So if I index a doc with a title of "Golden Master", then I'd expect a
query of q=title:"golden master" to work, but no go...

I know I must be missing something super obvious!

Matt

Re: case-insensitive phrase query for string fields

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Matt,

Tokenizer first.
Filter(s) second.
:)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Matt Mitchell <go...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, October 7, 2010 10:53:16 PM
> Subject: case-insensitive phrase query for string fields
> 
> What's the recommended approach for handling case-insensitive phrase
> queries?  I've got this setup, but no luck:
> 
> <fieldType name="ci_string"  class="solr.StrField">
>       <analyzer>
>           <filter  class="solr.LowerCaseFilterFactory"/>
>           <tokenizer class="solr.KeywordTokenizerFactory"/>
>        </analyzer>
> </fieldType>
> 
> So if I index a doc with a title  of "Golden Master", then I'd expect a
> query of q=title:"golden master" to  work, but no go...
> 
> I know I must be missing something super  obvious!
> 
> Matt
>