You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Agnieszka KukaƂowicz <ag...@usable.pl> on 2012/03/05 12:10:48 UTC

Polish language in Solr

Hi,

I have question about Polish language in Solr.

There are 2 options: StempelPolishStemFilterFactory or
HunspellStemFilterFactory with polish dictionary. I've made some tests but
the results are not satisfying me. StempelPolishStemFilterFactory is very
fast during indexing but the quality of searches is not exactly that I
expect. In turn HunspellStemFilterFactory is better in searching but
indexing polish text is very slow.

For example indexing 100k documents with StempelPolishStemFilterFactory
takes only 10 min (150 doc/sec), with HunspellStemFilterFactory - 1h 20
min, so it is only 18-20 doc/sec. (server with 8 cores, 24GB RAM, index on
SSD disk).

Is it possible to speed up indexing with hunspell? What should I optimize?

Have you any experience with Hunspell?

I use Solr 4.0.

Best regards
Agnieszka