You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Vauthrin, Laurent" <La...@disney.com> on 2009/03/19 23:12:51 UTC

Exact Match

Hello again,

 

I believe that this question has been posed before but I just wanted to
make sure I understood my options.  Here's the situation:

 

We have a few fields that are specified as 'text' and a few field that
are specified as 'string'.  As far as I understand, 'string' will do
exact matches whereas 'text' will do tokenized/contains matches.
However, we have a need to do exact matches on the 'text' field as well.

 

I believe I've seen two approaches for this problem:

1.       Using a copyField configuration and copy the 'text' field to a
'string' field.  Then use the string field when exact matches are
needed.

2.       Append something like '_start_' and '_end_' to the field at
index and search time for exact matches.

 

Are there any solutions to this problem that don't require creating
another field or modifying the data?  (i.e. some sort of query filter?)

 

Thanks,
Laurent


RE: Exact Match

Posted by Chris Hostetter <ho...@fucit.org>.
: Depending on your needs, you might want to do some sort of minimal
: analysis on the field (ignore punctuation, lowercase,...) Here's the
: text_exact field that I use:

Deans reply is a great example of what "exact" is a vague term.  

with a TextField you can get an "exact" match using a simple phrase query 
(ie: putting quotes arround the input) assuming your meaning of "exact" is 
that all the tokens appera together in sequence, and assuming your 
analyzer doesn't change things in a way that makes a phrase search match 
in a way that you don't consider "exact enough"

if you want to ensure that the documents contains exactly what the user 
queried for, no more and no less, then using a copyField into StrField is 
really the best way to do that.




-Hoss


RE: Exact Match

Posted by "Dean Missikowski (Consultant), CLSA" <de...@clsa.com>.
Hi Laurent,

I use the copy field approach and copy the text fields to a custom type
"text_exact" that I define in my schema.xml. This allows searching for
"exact matches" anywhere within the text field, which doesn't use tokens
injected by stemming, synonyms or other index-time filters. 

In my application code, I detect when users are performing an exact
match and set up the underlying solr query to use the "text_exact"
fields by specifying to use an exact match request handler (a modified
definition of the standard dismax request handler in solrconfig.xml) 

Depending on your needs, you might want to do some sort of minimal
analysis on the field (ignore punctuation, lowercase,...) Here's the
text_exact field that I use:

    <fieldtype name="text_exact" class="solr.TextField"
positionIncrementGap="100">
    	<analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="0" catenateWords="0"
catenateNumbers="0" catenateAll="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>

-- Dean

-----Original Message-----
From: Vauthrin, Laurent [mailto:Laurent.Vauthrin@disney.com] 
Sent: 20/03/2009 6:13 AM
To: solr-user@lucene.apache.org
Subject: Exact Match

Hello again,

 

I believe that this question has been posed before but I just wanted to
make sure I understood my options.  Here's the situation:

 

We have a few fields that are specified as 'text' and a few field that
are specified as 'string'.  As far as I understand, 'string' will do
exact matches whereas 'text' will do tokenized/contains matches.
However, we have a need to do exact matches on the 'text' field as well.

 

I believe I've seen two approaches for this problem:

1.       Using a copyField configuration and copy the 'text' field to a
'string' field.  Then use the string field when exact matches are
needed.

2.       Append something like '_start_' and '_end_' to the field at
index and search time for exact matches.

 

Are there any solutions to this problem that don't require creating
another field or modifying the data?  (i.e. some sort of query filter?)

 

Thanks,
Laurent


CLSA CLEAN & GREEN: Please consider our environment before printing this email.
The content of this communication is subject to CLSA Legal and Regulatory Notices. 
These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon request.