You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Tommaso Teofili <to...@gmail.com> on 2011/04/19 10:34:02 UTC

Possible bug with DictionaryAnnotator and escaped characters

Hi all,

I've just noticed an unexpected behavior in DictionaryAnnotator: if you
create a dictionary with the DictionaryCreator and your input file (text
file with one entry per line) contains characters like & or ' then they get
converted to their escaped version &amp; or &apos; as it's right in XML
syntax; the problem is that such entries don't match correctly with the
original entry string.
So a line like me&co will be written as <entry><key>me&amp;co</key></entry>
inside the dictionary.xml but neither the string "me&co" nor "me&amp;co"
will generate a match (and thus a DictionaryAnnotation).

Is it me missing something?
Regards,
Tommaso