You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexander Rosemann <al...@gmail.com> on 2014/03/04 21:48:37 UTC
Stemming Croatian, Macedonian, Serbian and Slovenian content
Hi,
I have the requirement to index and stem Croatian, Macedonian, Serbian
and Slovenian content. I started by creating a collection _hr_ for the
Croatian content and configured the HunSpellStemFilterFactory using the
.dic and .aff files provided by OpenOffice. While testing my
configuration I noticed that only very simple forms such as
hrvatski -> hrvatska,
algoritamskom -> algoritamska
get "stemmed". I was wondering whether there are better approaches for
Croatian content. I haven't tested the dict and aff files for the other
languages yet but I would expect similar results.
I am using Solr 4.1.
Any pointers to better stemmers, open source or commercial, are much
appreciated.
Many thanks,
Alex