You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kevin Osborn <os...@yahoo.com> on 2010/02/13 02:28:08 UTC

cannot match on phrase queries

I am seeing this in several of my fields. I have something like "Samsung 
X150" or "Nokia BH-212". And my query will not match on X150 or BH-212.

So, my query is something like +model:(Samsung X150). Through debugQuery, I see that this gets converted to +(model:samsung model:"x 150"). It 
matches on Samsung, but not X150. A simple query like model:BH-212 
simply fails. model:BH212 also fails. The only query that seems to work 
is model:(BH 212).

Here is the schema for that field:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100" >
      <analyzer type="index">
        <tokenizer 
class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" 
synonyms="index_synonyms.txt" ignoreCase="true" expand="true" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true" />
        
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" />
        <filter 
class="solr.LowerCaseFilterFactory" />
        <filter 
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" 
protected="protwords.txt" />
        <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
      
</analyzer>
      <analyzer type="query">
        <tokenizer 
class="solr.WhitespaceTokenizerFactory" />
        <filter 
class="solr.SynonymFilterFactory" synonyms="query_synonyms.txt" 
ignoreCase="true" expand="true" />
        <filter 
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
/>
        <filter class="solr.WordDelimiterFilterFactory" 
splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0" />
        
<filter class="solr.LowerCaseFilterFactory" />
        
<filter 
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" 
protected="protwords.txt" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldType>

<field 
name="model" type="text" indexed="true" stored="true" omitNorms="true" 
omitTermFreqAndPositions="true" />

Any ideas? According to the analyzer, I would expect the phrase "BH-212" to match on "bh" and 
"212". Or am I missing something?

Also, is there anyway to tell the parser to not convert "X150" into a phrase query. I have some cases when it would be more useful to turn it into +(X 150).

Re: cannot match on phrase queries

Posted by Kevin Osborn <os...@yahoo.com>.

It definitely had something to do with omitTermFreqAndPosition. As soon as I disabled the option and re-indexed, my queries starting working as expected.I suspect it has to something to do with terms occupying the same position and losing that information by using omitTermFreqAndPositions, but I am not entirely sure.

So, I just disabled the option and made my custom similarity always use tf=1.0. For my use cases, that should work fine.




________________________________
From: Erick Erickson <er...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Sat, February 13, 2010 2:56:44 PM
Subject: Re: cannot match on phrase queries

It's really hard to help unless you include the analysis and query
schema for the field in question since so much of how things work
is dependent upon those choices. Also include the query you fire
at SOLR....

I suspect that omitTermFreqAndPositions is irrelevant....

Erick

On Fri, Feb 12, 2010 at 8:28 PM, Kevin Osborn <os...@yahoo.com> wrote:

> I am seeing this in several of my fields. I have something like "Samsung
> X150" or "Nokia BH-212". And my query will not match on X150 or BH-212.
>
> So, my query is something like +model:(Samsung X150). Through debugQuery, I
> see that this gets converted to +(model:samsung model:"x 150"). It
> matches on Samsung, but not X150. A simple query like model:BH-212
> simply fails. model:BH212 also fails. The only query that seems to work
> is model:(BH 212).
>
> Here is the schema for that field:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100" >
>      <analyzer type="index">
>        <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="true" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>
> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" />
>        <filter
> class="solr.LowerCaseFilterFactory" />
>        <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt" />
>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory" />
>
> </analyzer>
>      <analyzer type="query">
>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
>        <filter
> class="solr.SynonymFilterFactory" synonyms="query_synonyms.txt"
> ignoreCase="true" expand="true" />
>        <filter
> class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
> />
>        <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0" />
>
> <filter class="solr.LowerCaseFilterFactory" />
>
> <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>      </analyzer>
>    </fieldType>
>
> <field
> name="model" type="text" indexed="true" stored="true" omitNorms="true"
> omitTermFreqAndPositions="true" />
>
> Any ideas? According to the analyzer, I would expect the phrase "BH-212" to
> match on "bh" and
> "212". Or am I missing something?
>
> Also, is there anyway to tell the parser to not convert "X150" into a
> phrase query. I have some cases when it would be more useful to turn it into
> +(X 150).
>
>
>
>

Re: cannot match on phrase queries

Posted by Erick Erickson <er...@gmail.com>.

It's really hard to help unless you include the analysis and query
schema for the field in question since so much of how things work
is dependent upon those choices. Also include the query you fire
at SOLR....

I suspect that omitTermFreqAndPositions is irrelevant....

Erick

On Fri, Feb 12, 2010 at 8:28 PM, Kevin Osborn <os...@yahoo.com> wrote:

> I am seeing this in several of my fields. I have something like "Samsung
> X150" or "Nokia BH-212". And my query will not match on X150 or BH-212.
>
> So, my query is something like +model:(Samsung X150). Through debugQuery, I
> see that this gets converted to +(model:samsung model:"x 150"). It
> matches on Samsung, but not X150. A simple query like model:BH-212
> simply fails. model:BH212 also fails. The only query that seems to work
> is model:(BH 212).
>
> Here is the schema for that field:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100" >
>      <analyzer type="index">
>        <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="true" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>
> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" />
>        <filter
> class="solr.LowerCaseFilterFactory" />
>        <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt" />
>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory" />
>
> </analyzer>
>      <analyzer type="query">
>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
>        <filter
> class="solr.SynonymFilterFactory" synonyms="query_synonyms.txt"
> ignoreCase="true" expand="true" />
>        <filter
> class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
> />
>        <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0" />
>
> <filter class="solr.LowerCaseFilterFactory" />
>
> <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>      </analyzer>
>    </fieldType>
>
> <field
> name="model" type="text" indexed="true" stored="true" omitNorms="true"
> omitTermFreqAndPositions="true" />
>
> Any ideas? According to the analyzer, I would expect the phrase "BH-212" to
> match on "bh" and
> "212". Or am I missing something?
>
> Also, is there anyway to tell the parser to not convert "X150" into a
> phrase query. I have some cases when it would be more useful to turn it into
> +(X 150).
>
>
>
>