You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vod <vo...@gmail.com> on 2012/08/16 22:21:40 UTC

splitOnCaseChange not working?

Hi,

I'm getting unexpected results and hoping someone can help. My schema.xml
has splitOnCaseChange="1" for the field I'm searching on (both index &
query), and the default search behavior is "OR".

I have a field with the word "Airline" indexed. When i search for "Airline"
I get the match. When I search for "Airline Alias", I get the match (as
expected). However, when I search for "AirlineAlias", I am not getting a
match. I was expecting the splitOnCaseChange property to separate out the
term AirlineAlias into the 2 base words. However, if that was happening,
then it should be finding the match to "Airline" (i.e. it should be the
exact same query as "Airline Alias").

Is my understanding correct? If so, any ideas on why I would not be getting
the correct search results?

I have copied the relevant sections from the schema.xml file below.

Thanks in advance for the help.

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.StopFilterFactory"
			ignoreCase="true"
			words="lang/stopwords_en.txt"
			enablePositionIncrements="true" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"
/>
		<filter class="solr.PorterStemFilterFactory" />
	</analyzer>
		<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
		<filter class="solr.StopFilterFactory"
				ignoreCase="true"
				words="lang/stopwords_en.txt"
				enablePositionIncrements="true" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"
/>
		<filter class="solr.PorterStemFilterFactory" />
	</analyzer>
</fieldType>

<fields>
	...
<field name="value" type="text_en_splitting" indexed="true" stored="true"
multiValued="true" omitNorms="true" />
/fields>

<solrQueryParser defaultOperator="OR" />




--
View this message in context: http://lucene.472066.n3.nabble.com/splitOnCaseChange-not-working-tp4001708.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: splitOnCaseChange not working?

Posted by vod <vo...@gmail.com>.

Tested the change and it is indeed working.
Thank you for the quick response.



--
View this message in context: http://lucene.472066.n3.nabble.com/splitOnCaseChange-not-working-tp4001708p4001733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: splitOnCaseChange not working?

Posted by Jack Krupansky <ja...@basetechnology.com>.

Just set autoGeneratePhraseQueries="false" on the ="text_en_splitting" field 
type.

The current setting treated AirlineAlias as the quoted phrase "Airline 
Alias".

-- Jack Krupansky

-----Original Message----- 
From: vod
Sent: Thursday, August 16, 2012 4:21 PM
To: solr-user@lucene.apache.org
Subject: splitOnCaseChange not working?
.
Hi,

I'm getting unexpected results and hoping someone can help. My schema.xml
has splitOnCaseChange="1" for the field I'm searching on (both index &
query), and the default search behavior is "OR".

I have a field with the word "Airline" indexed. When i search for "Airline"
I get the match. When I search for "Airline Alias", I get the match (as
expected). However, when I search for "AirlineAlias", I am not getting a
match. I was expecting the splitOnCaseChange property to separate out the
term AirlineAlias into the 2 base words. However, if that was happening,
then it should be finding the match to "Airline" (i.e. it should be the
exact same query as "Airline Alias").

Is my understanding correct? If so, any ideas on why I would not be getting
the correct search results?

I have copied the relevant sections from the schema.xml file below.

Thanks in advance for the help.

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"
/>
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"
/>
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
</fieldType>

<fields>
...
<field name="value" type="text_en_splitting" indexed="true" stored="true"
multiValued="true" omitNorms="true" />
/fields>

<solrQueryParser defaultOperator="OR" />




--
View this message in context: 
http://lucene.472066.n3.nabble.com/splitOnCaseChange-not-working-tp4001708.html
Sent from the Solr - User mailing list archive at Nabble.com.