You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Strokin, Eugene " <eu...@citi.com> on 2011/10/04 21:22:33 UTC

Case Insensitive Sting

Hello, I know that this topic was already discussed, but I want to make sure I understood it right.
I need to have a field for email of a user. I should be able to find a document(s) by this field, and it should be exact match, and case insensitive.
Based on that I've found from previous discussions, I couldn't use solr.StrField class, but should use solr.TextField class instead.
Also, I've suspect very match that later the requirements could change and I should be able to store not an email as identifier, but some free text, potentially with spaces, and some other white spaces, and still should be able to do exact case insensitive match.
So I come up with such type:

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

So, if I have a document with a field of such type, and it would contain value like this:
"ABC 123 xyz"
A "xyz" query shouldn't return the document, nor "xyz 123 ABC" query, but "abc 123 XYZ" should.
Am I correct in my assumption, or am I missing something?

Any comments are appreciated,
Thank you,
Eugene S.

Re: Case Insensitive Sting

Posted by Ahmet Arslan <io...@yahoo.com>.

> Hello, I know that this topic was
> already discussed, but I want to make sure I understood it
> right.
> I need to have a field for email of a user. I should be
> able to find a document(s) by this field, and it should be
> exact match, and case insensitive.
> Based on that I've found from previous discussions, I
> couldn't use solr.StrField class, but should use
> solr.TextField class instead.
> Also, I've suspect very match that later the requirements
> could change and I should be able to store not an email as
> identifier, but some free text, potentially with spaces, and
> some other white spaces, and still should be able to do
> exact case insensitive match.
> So I come up with such type:
> 
>     <fieldType name="string_ci"
> class="solr.TextField"
>         sortMissingLast="true"
> omitNorms="true">
>         <analyzer>
>             <filter
> class="solr.LowerCaseFilterFactory" />
>         </analyzer>
>     </fieldType>
> 
> So, if I have a document with a field of such type, and it
> would contain value like this:
> "ABC 123 xyz"
> A "xyz" query shouldn't return the document, nor "xyz 123
> ABC" query, but "abc 123 XYZ" should.
> Am I correct in my assumption, or am I missing something?

Yeap all is correct. However, there is no tokenizer is defined in your analyzer. In your case KeywordTokenizerFactory should be used.