You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Carrie Coy <cc...@ssww.com> on 2012/08/20 22:24:23 UTC

Shingle and PositionFilterFactory question

I am trying to use shingles and position filter to make a query for 
"foot print", for example, match either "foot print" or "footprint".   
 From the docs: using the PositionFilter 
<http://wiki.apache.org/solr/PositionFilter> in combination makes it 
possible to make all shingles synonyms of each other.

I've configured my analyzer like this:
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" 
maxShingleSize="2" outputUnigrams="true" 
outputUnigramsIfNoShingles="false" tokenSeparator=""/>
<filter class="solr.PositionFilterFactory"/>
</analyzer>

user query:  "foot print"

Without PositionFilterFactory, parsed query:+(((title:foot) 
(title:print))~2) (title:"(foot footprint) print")

With PositionFilterFactory, parsed query: +(((title:foot) 
(title:print))~2) ()

Why, when I add PositionFilterFactory into the mix, is the "footprint" 
shingle is omitted?

Output of analysis:

WT
	
text
raw_bytes
start
end
position
type

	
foot
[66 6f 6f 74]
0
4
1
word

	
print
[70 72 69 6e 74]
5
10
2
word

SF
	
text
raw_bytes
start
end
positionLength
type
position

	
foot
[66 6f 6f 74]
0
4
1
word
1

	
footprint
[66 6f 6f 74 70 72 69 6e 74]
0
10
2
shingle
1

	
print
[70 72 69 6e 74]
5
10
1
word
2



Thanks,
Carrie Coy