You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Mehdi Alaoui Belghiti <al...@gmail.com> on 2013/03/04 11:26:36 UTC

UIMA [new user]

Hi,
I was looking for a platform that can make me processing files written in
different formats (xml, owl, rdf,...) and extract relevant information. So
i found UIMA.
However, I found only examples for processing natural language.
Is UIMA limited to this, or it can allow me for example extracting classes
or attributes from an a Ecore file?

Thank you for help! I would be happy to find examples of processing more
complex data.

Re: UIMA [new user]

Posted by Marshall Schor <ms...@schor.com>.

On 3/4/2013 5:30 AM, Diman Karagiozov wrote:
> Hi there,
>
> UIMA does not do out-of-the-box text extraction from various document formats.
> For this task you can use TIKA ( http://tika.apache.org/).
There is also a UIMA add-on annotator which wraps TIKA and enables it to run
inside a UIMA pipeline - perhaps useful if you're going to be combining other
analytics with this.

http://uima.apache.org/sandbox.html#tika.annotator

-Marshall
>
> In our project (ATLAS - http://www.atlasproject.eu/) we've developed a text
> extraction framework prior UIMA wrapped NLP tools for different languages. Do
> not hesitate to contact me if you need more information on this.
>
> greetings
> Diman
>
> On 03/04/2013 12:26 PM, Mehdi Alaoui Belghiti wrote:
>> Hi,
>> I was looking for a platform that can make me processing files written in
>> different formats (xml, owl, rdf,...) and extract relevant information. So
>> i found UIMA.
>> However, I found only examples for processing natural language.
>> Is UIMA limited to this, or it can allow me for example extracting classes
>> or attributes from an a Ecore file?
>>
>> Thank you for help! I would be happy to find examples of processing more
>> complex data.
>>
>
>

Re: UIMA [new user]

Posted by Diman Karagiozov <di...@tetracom.com>.

Hi there,

UIMA does not do out-of-the-box text extraction from various document 
formats.
For this task you can use TIKA ( http://tika.apache.org/).

In our project (ATLAS - http://www.atlasproject.eu/) we've developed a 
text extraction framework prior UIMA wrapped NLP tools for different 
languages. Do not hesitate to contact me if you need more information on 
this.

greetings
Diman

On 03/04/2013 12:26 PM, Mehdi Alaoui Belghiti wrote:
> Hi,
> I was looking for a platform that can make me processing files written in
> different formats (xml, owl, rdf,...) and extract relevant information. So
> i found UIMA.
> However, I found only examples for processing natural language.
> Is UIMA limited to this, or it can allow me for example extracting classes
> or attributes from an a Ecore file?
>
> Thank you for help! I would be happy to find examples of processing more
> complex data.
>

Re: UIMA [new user]

Posted by Richard Eckart de Castilho <ec...@ukp.informatik.tu-darmstadt.de>.

Hi,

Am 04.03.2013 um 11:26 schrieb Mehdi Alaoui Belghiti <al...@gmail.com>:

> I was looking for a platform that can make me processing files written in
> different formats (xml, owl, rdf,...) and extract relevant information. So
> i found UIMA.
> However, I found only examples for processing natural language.
> Is UIMA limited to this, or it can allow me for example extracting classes
> or attributes from an a Ecore file?

UIMA is the Unstructured Information Management Architecture. Its meant to be
used to analyze unstructured information, such as texts, and as a result of the analysis 
superimpose structured information on top of it, such as linguistic or semantic categories.

If you need a library to access ecore files, you should probably look at the Eclipse EMF
framework. While UIMA can store analysis results in XMI files, UIMA is not a library to
access and process XMI files in general. Likewise for XML and RDF.

Cheers,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckart@ukp.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------