You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Martinelli <si...@gmail.com> on 2015/03/31 17:03:11 UTC

solr.DictionaryCompoundWordTokenFilterFactory extracts words in string

Hi,

I configured solr.DictionaryCompoundWordTokenFilterFactory using a
dictionary with the following content:

- lindor
- schlitten
- dorsch
- filet

I want to index the compound words

- dorschfilet
- lindorschlitten

dorschfilet is processed as expected

dorsch filet

but lindorschlitten is compound of

lindor and schlitten

but i get

lindor dorsch schlitten

so the filter is extracting dorsch but the word before (lin) and after
(litten) are not valid word parts.

Is there any better compound word filter for German?

Thanks, Simon