You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2011/04/22 02:06:56 UTC

term position question from analyzer stack for WordDelimiterFilterFactory

So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side.  Without that setting the all lowercase version of AppleTV is in term position two due to the catenateWords=1 or the catenateAll=1 settings.  I am surprised.  How does term position affect searching?  Here is my analysis with preserveOriginal=1 to make the lower case occur in both term position 1 and 2:

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1}
term position 	1	2
term text 	AppleTV	TV
		Apple		AppleTV
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 	1	2
term text 	appletv	tv
		apple		appletv
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt}
term position 	1	2
term text 	appletv	tv
		apple		appletv
term type 	word	word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 	1	2
term text 	appletv	tv
    		apple		appletv
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload