You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ethan Collins <co...@gmail.com> on 2010/07/12 15:22:45 UTC

ShingleFilter failing with more words than indexed phrase

I am using Solr 1.4.1 (lucene 2.9.3) on windows and am trying to
understand ShingleFilter. I wrote the following code and find that if
I provide more words than the actual phrase index in the field, then
the search on that field fails (no score found with debugQuery=true).

Here is an example to reproduce, with field names:
Id: 1
title_1: Nina Simone
title_2: I put a spell on you

Query (dismax) with: “Nina Simone I put”  <- Fails i.e. no score shown
from title_1 search (using debugQuery)
“Nina Simone” <- Success

I checked the index with luke and it showed correct indexes. I used
Solr’s Field Analysis with the ‘shingle’ field and tried “Nina Simone
I put” and it succeeds, as I would expect as correct behavior. It’s
only during the query that no score is provided. I also checked
‘parsedquery’ and it shows disjunctionMaxQuery issuing the string
“Nina_Simone Simone_I I_put” to the title_1 field.

title_1 and title_2 fields are of type ‘shingle’, defined as:

    <fieldType name="shingle" class="solr.TextField"
positionIncrementGap="100" indexed="true" stored="true">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
        </analyzer>
    </fieldType>

Note that I also have a catchall field which is text. I have qf set
to: 'id catchall' and pf set to: 'title_1 title_2'

Am I missing something here in my expectation or is there a bug somewhere?

-Ethan