You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Brian & Fran <bn...@gmail.com> on 2017/12/21 14:39:00 UTC

Help with Alphanumeric Tokens

Good day, Peter,

We are learning UIMA Ruta and are having some problems with it. As I posted on stackoverflow, we have a lot of data in our documents that does not fit the traditional natural language mold. We have a lot of alphanumeric data such as file hashes, email addresses, domains, etc. We tried to re-work the JFlex lexer and re-build ruta-core, but are now struggling to get it working in the Ruta Workbench. Is there a better way to parse out and annotate such data? A file containing sentences or tabular data with MD5 hashes would be a great example.

Thank you,
Fran

Sent from my iPhone