You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov> on 2010/01/13 20:18:37 UTC

case-insensitive string type

Hi I have a field:

<field name="srcANYSTRStrCI" type="string_ci" indexed="true" stored="true" multiValued="true" />

With type definition:
		<!-- A Case insensitive version of string type  -->
		<fieldType name="string_ci" class="solr.StrField"
			sortMissingLast="true" omitNorms="true">
			<analyzer type="index">
				<tokenizer class="solr.KeywordTokenizerFactory"/>			
				<filter class="solr.LowerCaseFilterFactory" />
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.KeywordTokenizerFactory"/>
				<filter class="solr.LowerCaseFilterFactory" />
			</analyzer>
		</fieldType>

When searching that field I can't get a case-insensitive match.  It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't

Essentially I am trying to get case-insensitive matching that supports wild cards...

Tim Harsch
Sr. Software Engineer
Dell Perot Systems
(650) 604-0374


Re: case-insensitive string type

Posted by Erick Erickson <er...@gmail.com>.
What do you get when you add "&debugQuery=on" to your lower-case query?

And does Luke show you what you expect in the index?


On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
<ti...@nasa.gov> wrote:

> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" stored="true"
> multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory"
> />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory"
> />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It works
> as if it is a regular string, for instance I can do a prefix query and so
> long as the prefix matches the case of the value it works, but if I change
> the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild
> cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
Thanks, I know I read that sometime back but I guess I thought that was because there were no <analyzer> tags defined on the string field in the schema.  I guess cause I'm still kind of a noob - I didn't take that to mean that it couldn't be made to have analyzers.  A subtle but important distinction I guess.

So my concern now is that my use case is that I need a field that behaves like string, case-sensitive, and a case-insensitive version of the same.  Is it the case the solr.StrField and solr.textField with LowerCaseFilterFactory and KeywordTokenizerFactory only differ by their treatment of character case?

-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Wednesday, January 13, 2010 12:18 PM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

> That seems to work.
> 
> But why?  Does string type not support
> LowerCaseFilterFactory?  Or KeywordTokenizerFactory?

>From from apache-solr-1.4.0\example\solr\conf\schema.xml :

"The StrField type is not analyzed, but indexed/stored verbatim." 

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters."




      

RE: case-insensitive string type

Posted by Ahmet Arslan <io...@yahoo.com>.
> That seems to work.
> 
> But why?  Does string type not support
> LowerCaseFilterFactory?  Or KeywordTokenizerFactory?

From from apache-solr-1.4.0\example\solr\conf\schema.xml :

"The StrField type is not analyzed, but indexed/stored verbatim." 

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters."




      

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
That seems to work.

But why?  Does string type not support LowerCaseFilterFactory?  Or KeywordTokenizerFactory?

-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Wednesday, January 13, 2010 11:51 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

> The value in the srcANYSTRStrCI field
> is "miXCAse or LowER" according to Luke.

Can you try this fieldType (that uses class="solr.TextField") declaration and re-start tomcat & re-index:

 <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>        
        <tokenizer class="solr.KeywordTokenizerFactory"/>       
        <filter class="solr.LowerCaseFilterFactory" />       
        <filter class="solr.TrimFilterFactory" />
     </analyzer>
</fieldType>








      

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
I created a document that has a string field and a case insensitive string field using my string_ci type, both have the same value sent at document creation time: "miXCAse or LowER".

I attach two debug query results.  One against the string type and one against mine.  The query is only different by changing the query field.

Against the string there are results. Against mine there are none.  Looking at the debug info, querying my type does lower case the query value it seems.  Does this mean the analyzer to the index is failing?  Would the fact that Luke shows the value as case preserved in both the string field and the string_ci field support this?

-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov] 
Sent: Wednesday, January 13, 2010 11:35 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

The value in the srcANYSTRStrCI field is "miXCAse or LowER" according to Luke.

-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov] 
Sent: Wednesday, January 13, 2010 11:31 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

>From the query
http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22&debugQuery=on

Debug info attached


-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov]
Sent: Wednesday, January 13, 2010 11:28 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either.

-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com]
Sent: Wednesday, January 13, 2010 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: case-insensitive string type

from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

    "On wildcard and fuzzy searches, no text analysis is performed on the search word."

i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr.

hth,
rob

On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] <ti...@nasa.gov> wrote:
> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" 
> stored="true" multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer 
> class="solr.KeywordTokenizerFactory"/>
>                                <filter 
> class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer 
> class="solr.KeywordTokenizerFactory"/>
>                                <filter 
> class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It 
> works as if it is a regular string, for instance I can do a prefix 
> query and so long as the prefix matches the case of the value it 
> works, but if I change the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>

RE: case-insensitive string type

Posted by Ahmet Arslan <io...@yahoo.com>.
> The value in the srcANYSTRStrCI field
> is "miXCAse or LowER" according to Luke.

Can you try this fieldType (that uses class="solr.TextField") declaration and re-start tomcat & re-index:

 <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>        
        <tokenizer class="solr.KeywordTokenizerFactory"/>       
        <filter class="solr.LowerCaseFilterFactory" />       
        <filter class="solr.TrimFilterFactory" />
     </analyzer>
</fieldType>








      

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
The value in the srcANYSTRStrCI field is "miXCAse or LowER" according to Luke.

-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov] 
Sent: Wednesday, January 13, 2010 11:31 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

>From the query
http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22&debugQuery=on

Debug info attached


-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov]
Sent: Wednesday, January 13, 2010 11:28 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either.

-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com]
Sent: Wednesday, January 13, 2010 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: case-insensitive string type

from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

    "On wildcard and fuzzy searches, no text analysis is performed on the search word."

i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr.

hth,
rob

On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] <ti...@nasa.gov> wrote:
> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" 
> stored="true" multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer 
> class="solr.KeywordTokenizerFactory"/>
>                                <filter 
> class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer 
> class="solr.KeywordTokenizerFactory"/>
>                                <filter 
> class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It 
> works as if it is a regular string, for instance I can do a prefix 
> query and so long as the prefix matches the case of the value it 
> works, but if I change the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
>From the query
http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22&debugQuery=on

Debug info attached


-----Original Message-----
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.harsch@nasa.gov] 
Sent: Wednesday, January 13, 2010 11:28 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive string type

I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either.

-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com] 
Sent: Wednesday, January 13, 2010 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: case-insensitive string type

from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

    "On wildcard and fuzzy searches, no text analysis is performed on
the search word."

i'd just lowercase the wildcard-ed search term in your client code,
before you send it to solr.

hth,
rob

On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] <ti...@nasa.gov> wrote:
> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" stored="true" multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>

RE: case-insensitive string type

Posted by "Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]" <ti...@nasa.gov>.
I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either.

-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com] 
Sent: Wednesday, January 13, 2010 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: case-insensitive string type

from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

    "On wildcard and fuzzy searches, no text analysis is performed on
the search word."

i'd just lowercase the wildcard-ed search term in your client code,
before you send it to solr.

hth,
rob

On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] <ti...@nasa.gov> wrote:
> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" stored="true" multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>

Re: case-insensitive string type

Posted by Rob Casson <ro...@gmail.com>.
from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

    "On wildcard and fuzzy searches, no text analysis is performed on
the search word."

i'd just lowercase the wildcard-ed search term in your client code,
before you send it to solr.

hth,
rob

On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] <ti...@nasa.gov> wrote:
> Hi I have a field:
>
> <field name="srcANYSTRStrCI" type="string_ci" indexed="true" stored="true" multiValued="true" />
>
> With type definition:
>                <!-- A Case insensitive version of string type  -->
>                <fieldType name="string_ci" class="solr.StrField"
>                        sortMissingLast="true" omitNorms="true">
>                        <analyzer type="index">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> When searching that field I can't get a case-insensitive match.  It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't
>
> Essentially I am trying to get case-insensitive matching that supports wild cards...
>
> Tim Harsch
> Sr. Software Engineer
> Dell Perot Systems
> (650) 604-0374
>
>