You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by hossmaa <an...@gmail.com> on 2015/05/28 15:54:23 UTC

solr and uima dictionary annotator

Hi everyone

I am using the UIMA DictionaryAnnotator to tag Solr documents. It seems to
be working (I do get tags), but I get some strange behavior:

1. I am using the White Space Tokenizer both for the indexed text and for
creating the dictionary. Most entries in my dictionary consist of multiple
words. From the documentation, it seems that with the default settings, a
document must contain all words in order to match the dictionary entry.
However, this is not the case in practice. I'm seeing documents being
randomly tagged with single words, although my dictionary does not contain
an entry for those single words (they only appear as part of multi word
entries). This would be fine (even preferable), if it were consistent. But
it is not. The tagging happens only for a subset of single words, not for
all. What am I doing wrong?

2. If a dictionary word appears multiple times in the analyzed field, it is
also added just as many times to the mapped field (i.e. my tags). Is there a
way to control/disable this?

Thanks!
Regards
Andreea



--
View this message in context: http://lucene.472066.n3.nabble.com/solr-and-uima-dictionary-annotator-tp4208359.html
Sent from the Solr - User mailing list archive at Nabble.com.