You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Yeadon <sc...@anu.edu.au> on 2010/11/18 04:21:15 UTC

case insensitive sort and LowerCaseFilterFactory

Hi,

I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the 
sorting of results. According to this page 
http://web.archiveorange.com/archive/v/AAfXfzy5Tm1uDy5mYW3B I should be 
able to configure the LowerCaseFilterFactory to ensure results will be 
indexed and returned in a case insensitive means, however this does not 
appear to be working for me. Is someone able to check my field config to 
confirm it is ok (and if anyone has any advice on making this work it 
would be appreciated - my issue is the same as that in the provided link 
(that is, upper case and lower case are being ordered separately instead 
of being interspersed). The sort field I'm using is of type text as 
defined below.

The text field type is configured as follows:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
</analyzer>
</fieldType>

When I sort on a primaryName field (which is a "text" field as define 
above) for example, I get records listed out of order as in the 
following example:
- Withers, Alfred Robert (1863–1956)
- Young, Charles (1838–1916)
- de Little, Ernest (1868–1926)
- de Pledge, Thomas Frederick (1867–1954)
- von Bibra, William (1876–1926)

I imagine I'm missing something obvious, the obvious workaround is a 
namesort field however from the above post it looks like this can be 
avoided.

Scott.