You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Martinelli <si...@gmail.com> on 2015/03/31 17:03:11 UTC
solr.DictionaryCompoundWordTokenFilterFactory extracts words in string
Hi,
I configured solr.DictionaryCompoundWordTokenFilterFactory using a
dictionary with the following content:
- lindor
- schlitten
- dorsch
- filet
I want to index the compound words
- dorschfilet
- lindorschlitten
dorschfilet is processed as expected
dorsch filet
but lindorschlitten is compound of
lindor and schlitten
but i get
lindor dorsch schlitten
so the filter is extracting dorsch but the word before (lin) and after
(litten) are not valid word parts.
Is there any better compound word filter for German?
Thanks, Simon