You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Christopher Gross <co...@gmail.com> on 2012/11/07 16:15:31 UTC

Matching an exact phrase in a text field

I have this as my "text" field:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0
"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase
="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0
"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

I have quite a few data points, but I'll just stick to one for now.  I
store it twice, once as text (field name: region) and once as a string
(regionF -- used for making facet fields).
I allow the user to search using a facet in a web interface inside a query
box (so they can enter region:Africa), which works fine.  The problem is
that they cannot search for region:"North America", but regionF:"North
America" works.  Is there something wrong that I've done for setting up the
text field?  Shouldn't they be able to do an exact match on it?

Let me know if I'm just missing something due to lack of sleep.

Thanks!

-- Chris

Re: Matching an exact phrase in a text field

Posted by Christopher Gross <co...@gmail.com>.

I do have the omit positions turned on...d'oh.

So if I cut those out, then the region (and other similar fields) should
work correctly?

Thanks Jack!

-- Chris


On Wed, Nov 7, 2012 at 10:31 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Try the Solr Admin Analysis page to see how your phrase and input text are
> actually being analyzed. Also add &debugQuery=true to your query and see
> what the "parsed" query looks like. And check to see that "North" and
> "America" are not mentioned in either your stop words or synonyms.
>
> Also, check/show us the two field definitions. I mean, is it possible that
> you might have "omit positions" on the region field?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Christopher Gross
> Sent: Wednesday, November 07, 2012 7:15 AM
> To: solr-user
> Subject: Matching an exact phrase in a text field
>
>
> I have this as my "text" field:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
> <charFilter class="solr.**HTMLStripCharFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
> <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0
> "/>
> <filter class="solr.**LowerCaseFilterFactory"/>
> <filter class="solr.**EnglishPorterFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.**RemoveDuplicatesTokenFilterFac**tory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase
> ="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
> <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0
> "/>
> <filter class="solr.**LowerCaseFilterFactory"/>
> <filter class="solr.**EnglishPorterFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.**RemoveDuplicatesTokenFilterFac**tory"/>
> </analyzer>
> </fieldType>
>
> I have quite a few data points, but I'll just stick to one for now.  I
> store it twice, once as text (field name: region) and once as a string
> (regionF -- used for making facet fields).
> I allow the user to search using a facet in a web interface inside a query
> box (so they can enter region:Africa), which works fine.  The problem is
> that they cannot search for region:"North America", but regionF:"North
> America" works.  Is there something wrong that I've done for setting up the
> text field?  Shouldn't they be able to do an exact match on it?
>
> Let me know if I'm just missing something due to lack of sleep.
>
> Thanks!
>
> -- Chris
>

Re: Matching an exact phrase in a text field

Posted by Jack Krupansky <ja...@basetechnology.com>.

Try the Solr Admin Analysis page to see how your phrase and input text are 
actually being analyzed. Also add &debugQuery=true to your query and see 
what the "parsed" query looks like. And check to see that "North" and 
"America" are not mentioned in either your stop words or synonyms.

Also, check/show us the two field definitions. I mean, is it possible that 
you might have "omit positions" on the region field?

-- Jack Krupansky

-----Original Message----- 
From: Christopher Gross
Sent: Wednesday, November 07, 2012 7:15 AM
To: solr-user
Subject: Matching an exact phrase in a text field

I have this as my "text" field:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0
"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase
="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0
"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

I have quite a few data points, but I'll just stick to one for now.  I
store it twice, once as text (field name: region) and once as a string
(regionF -- used for making facet fields).
I allow the user to search using a facet in a web interface inside a query
box (so they can enter region:Africa), which works fine.  The problem is
that they cannot search for region:"North America", but regionF:"North
America" works.  Is there something wrong that I've done for setting up the
text field?  Shouldn't they be able to do an exact match on it?

Let me know if I'm just missing something due to lack of sleep.

Thanks!

-- Chris