You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doris Peter <Do...@bsb-muenchen.de> on 2019/07/18 08:48:05 UTC
Problems with StemFilter and Wildcards
Hi, we have got some problems with the stemming of our ocr-texts:
We use the following configuration for our full-text-ocr field:
<fieldtype name="text_ocr" class="solr.TextField" termPositions="true" termVectors="true" termPayloads="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GermanStemFilterFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.DelimitedPayloadTokenFilterFactory" delimiter="⚑"
encoder="org.mdz.search.solrocr.lucene.byteoffset.ByteOffsetEncoder" />
<filter class="solr.WordDelimiterGraphFilterFactory" protected="protectedword.txt"
preserveOriginal="0" splitOnNumerics="1" splitOnCaseChange="0"
catenateWords="1" catenateNumbers="1" catenateAll="1"
generateWordParts="1" generateNumberParts="1" stemEnglishPossessive="1"
types="wdfftypes.txt" />
</analyzer>
</fieldtype>
Now it seems, the StemFilter and wildcard queries don't work together.
When I search for
Weltkriegs I get 6 documents.
But when I search for
Weltkrie?s I get only 1 document.
For
wel?kriegs as well, only 1 document.
It happens only with terms which are changed by the stemming filter. Is there a way to fix this?
Thanks a lot, Doris