You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Philtjens, Raf" <ra...@hp.com> on 2013/04/03 13:26:01 UTC

Words being duplicated with highlighting & DictionaryCompoundWordTokenFilterFactory

I'm having issues with highlighting & DictionaryCompoundWordTokenFilterFactory in Solr 3.6.1/3.6.2.

It's duplicating/adding words in the highlighted snippet. For example, my dictionary (dutch) has the following words: premie, beter, ring.
If I search for 'verbetering', results with 'verbeteringspremie' are correctly found, but highlighted as following: Ver<highlight>beter</highlight><highlight>Verbetering</highlight>spremie.
Words from the DictionaryCompoundWordTokenFilterFactory dictionary are added to the highlighted item, resulting in all kinds of jibberish.

schema.xml > http://pastebin.com/SxGAg52N (problem is happening for fields of type 'text')
solrconfig.xml > http://pastebin.com/MUTkgZJq

Only solution I can come up at the moment is removing those words (beter, ring) from the dictionary (which disables word compound searching on those words...which is unwanted).

Any idea what this could be? I found someone else facing the exact same problem: http://stackoverflow.com/questions/13879349/solr-duplicating-words-in-highlighted-results - unfortunately, no workable solution has been given.