You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Olson, Ron" <RO...@lbpc.com> on 2011/08/16 17:20:09 UTC

Exact matching on names?

Hi all-

I'm missing something fundamental yet I've been unable to find the definitive answer for exact name matching. I'm indexing names using the standard "text" field type and my search is for the name "clarke". My results include "clark", which is incorrect, it needs to match clarke exactly (case insensitive).

I tried textType but that doesn't work because I believe it needs to be *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, and clark", etc.

Thanks for any help,

Ron

DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Exact matching on names?

Posted by Rob Casson <ro...@gmail.com>.
"exact" can mean a lot of things (do diacritics count?, etc), but in
this case, it sounds like you just need to turn off the stemmer you
have on this fieldtype (or create a new one that doesn't include the
stemmer).

hth,
rob

On Tue, Aug 16, 2011 at 11:20 AM, Olson, Ron <RO...@lbpc.com> wrote:
> Hi all-
>
> I'm missing something fundamental yet I've been unable to find the definitive answer for exact name matching. I'm indexing names using the standard "text" field type and my search is for the name "clarke". My results include "clark", which is incorrect, it needs to match clarke exactly (case insensitive).
>
> I tried textType but that doesn't work because I believe it needs to be *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, and clark", etc.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>

RE: Exact matching on names?

Posted by "Olson, Ron" <RO...@lbpc.com>.
Thank you Sujit and Rob for your help; I took the "easy" way and created a new field type that is identical to text, but with the stemmer removed. This seems, so far, to work exactly as needed.

To help anyone else who comes across this issue, this is the field type I used:

<fieldType name="textNoStem" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


-----Original Message-----
From: Sujit Pal [mailto:sujit.pal@comcast.net]
Sent: Tuesday, August 16, 2011 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching on names?

Hi Ron,

There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:

+name:clarke name_s:"clarke"^100

The name field is text so it will analyze down "clarke" to "clark" but
it will match both "clark" and "clarke" and the second clause would
boost the entry with "clarke" up to the top, which you then select with
rows=1.

-sujit

On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote:
> Hi all-
>
> I'm missing something fundamental yet I've been unable to find the definitive answer for exact name matching. I'm indexing names using the standard "text" field type and my search is for the name "clarke". My results include "clark", which is incorrect, it needs to match clarke exactly (case insensitive).
>
> I tried textType but that doesn't work because I believe it needs to be *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, and clark", etc.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Exact matching on names?

Posted by Sujit Pal <su...@comcast.net>.
Hi Ron,

There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:

+name:clarke name_s:"clarke"^100

The name field is text so it will analyze down "clarke" to "clark" but
it will match both "clark" and "clarke" and the second clause would
boost the entry with "clarke" up to the top, which you then select with
rows=1.

-sujit

On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote:
> Hi all-
> 
> I'm missing something fundamental yet I've been unable to find the definitive answer for exact name matching. I'm indexing names using the standard "text" field type and my search is for the name "clarke". My results include "clark", which is incorrect, it needs to match clarke exactly (case insensitive).
> 
> I tried textType but that doesn't work because I believe it needs to be *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, and clark", etc.
> 
> Thanks for any help,
> 
> Ron
> 
> DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.