You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Rukku <ru...@gmail.com> on 2013/06/18 14:05:01 UTC

UIMA for travel emails

We are new to UIMA framework.

We studying UIMA to see if we can use it to parse and extract information 
from travel related emails (confirmation, cancellation). Information can be 
Passenger names, Itinarary, flight details etc. and make an XML output.

We tried using UIMA and ended up using just the Regex components which we 
thought we could have use plain Java libraries to acheive the same.

Any help in giving us some direction will be greatly appreciated.

Regards,



Re: UIMA for travel emails

Posted by Dingcheng Li <di...@gmail.com>.
UIMA is just a pipeline framework. It doesn't mean that it can do things,
like NLP internally. Instead, it can be used to develop NLP or machine
learning system in a streamline or parallel or any other fashion. Good
system using UIMA includes cTakes, IBM's Dr. Watson and so on.




On Tue, Jun 18, 2013 at 7:05 AM, Rukku <ru...@gmail.com> wrote:

> We are new to UIMA framework.
>
> We studying UIMA to see if we can use it to parse and extract information
> from travel related emails (confirmation, cancellation). Information can be
> Passenger names, Itinarary, flight details etc. and make an XML output.
>
> We tried using UIMA and ended up using just the Regex components which we
> thought we could have use plain Java libraries to acheive the same.
>
> Any help in giving us some direction will be greatly appreciated.
>
> Regards,
>
>
>

Re: UIMA for travel emails

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Am 18.06.2013 14:05, schrieb Rukku:
> We are new to UIMA framework.
>
> We studying UIMA to see if we can use it to parse and extract information
> from travel related emails (confirmation, cancellation). Information can be
> Passenger names, Itinarary, flight details etc. and make an XML output.
>
> We tried using UIMA and ended up using just the Regex components which we
> thought we could have use plain Java libraries to acheive the same.
>
> Any help in giving us some direction will be greatly appreciated.

A solution for this task depends (in my opinion) mainly on the 
properties of the input and if there is labeled data. It's rather not a 
question of architecture.

Some (incomplete) thoughts about UIMA-based approaches:
- You could train a CRF or something similar with ClearTK [1] if you 
have enough labeled data.
- For simple NER, there are some models provided by DKPro [2].
- If you want to define some rules or patterns, then there is UIMA Ruta 
(Rule-based Text Annotation) [3].

Best,

Peter

[1] https://code.google.com/p/cleartk/
[2] 
https://docs.google.com/spreadsheet/pub?key=0ApGcdapz0xSYdGh2azY2ODMtZDRNczUySEZJUFpXM2c&single=true&gid=0&output=html
[3] http://uima.apache.org/ruta.html


> Regards,