You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Srinivas Yerram <sr...@motivitylabs.com> on 2014/11/22 18:33:46 UTC

UIMA framework annotators multiple languages support clarifications

Dear Sir / Madam,

My core use cases are related to email data parsing, which are in different templates and in different languages. Which I need to extract useful information through UIMA annotators or any other plugin components. Scalability and clustering is high priority in my use case.

I would like to get clarification on apache UIMA framework as mentioned in below:

Whether UIMA framework annotators or any plug-in components will support for multi-language(like English,French,Arabic,Chinese etc) to parse the email contents ?


Whether can I integrate Stanford NLP  libraries can be used as a plugin for apache UIMA framework components ?


I will appreciate for any quick response on this. Thanks

Regards,
Srinivas Yerram

Re: UIMA framework annotators multiple languages support clarifications

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 23.11.2014, at 14:27, Srinivas Yerram <sr...@motivitylabs.com> wrote:

> Thank you so much Richard for your response.
> 
> I have couple of question on apache UIMA features as ....
> 
> 1.Does apache UIMA support for multi languages(non English like Chinese,French,Arabic  etc..)  text analysis ?

Apache UIMA is language agnostic. It depends on the actual components if and how multi-language support is realized.

E.g. the DKPro Core components look at what language a document is in and then try to automatically
load a suitable model for processing the document.

> 2.Does have any official documentation to integrate Stanford NLP with apache UIMA framework ? 

If non-Apache counts also as "official", then maybe this one:

https://code.google.com/p/dkpro-core-asl/wiki/StanfordCoreComponents

> 3. Does have any official documentation info on openNLP with apache UIMA framework ?

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima

Cheers,

-- Richard

RE: UIMA framework annotators multiple languages support clarifications

Posted by Srinivas Yerram <sr...@motivitylabs.com>.

Thank you so much Richard for your response.

I have couple of question on apache UIMA features as ....

1.Does apache UIMA support for multi languages(non English like Chinese,French,Arabic  etc..)  text analysis ?

2.Does have any official documentation to integrate Stanford NLP with apache UIMA framework ? 

3. Does have any official documentation info on openNLP with apache UIMA framework ?

I appreciate for any response on above clarifications.

Thanks & Regards,
Srinivas Yerram

-----Original Message-----
From: Richard Eckart de Castilho [mailto:rec@apache.org] 
Sent: 23 November 2014 04:28
To: dev@uima.apache.org
Subject: Re: UIMA framework annotators multiple languages support clarifications

Hi,

UIMA is a framework that enables unstructured content analysis - it does in general not provide it.

UIMA component collections provide analysis components. Some do include wrappers for third-party NLP tools such as the Stanford NLP tools.

These are some collections I know (not exhaustive):

ClearTK                 - http://cleartk.googlecode.com
cTAKES                  - http://ctakes.apache.org/
DKPro Core              - http://code.google.com/p/dkpro-core-asl/
OpenNLP UIMA components - http://opennlp.apache.org
UIMA Addons & Sandbox   - http://uima.apache.org/sandbox.html

I should note that I'm involved with the DKPro Core collection.

Cheers,

-- Richard

On 22.11.2014, at 18:33, Srinivas Yerram <sr...@motivitylabs.com> wrote:

> 
> Dear Sir / Madam,
> 
> My core use cases are related to email data parsing, which are in different templates and in different languages. Which I need to extract useful information through UIMA annotators or any other plugin components. Scalability and clustering is high priority in my use case.
> 
> I would like to get clarification on apache UIMA framework as mentioned in below:
> 
> Whether UIMA framework annotators or any plug-in components will support for multi-language(like English,French,Arabic,Chinese etc) to parse the email contents ?
> 
> 
> Whether can I integrate Stanford NLP  libraries can be used as a plugin for apache UIMA framework components ?
> 
> 
> I will appreciate for any quick response on this. Thanks
> 
> Regards,
> Srinivas Yerram

Re: UIMA framework annotators multiple languages support clarifications

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hi,

UIMA is a framework that enables unstructured content analysis - it does in general not provide it.

UIMA component collections provide analysis components. Some do include 
wrappers for third-party NLP tools such as the Stanford NLP tools.

These are some collections I know (not exhaustive):

ClearTK                 - http://cleartk.googlecode.com
cTAKES                  - http://ctakes.apache.org/
DKPro Core              - http://code.google.com/p/dkpro-core-asl/
OpenNLP UIMA components - http://opennlp.apache.org
UIMA Addons & Sandbox   - http://uima.apache.org/sandbox.html

I should note that I'm involved with the DKPro Core collection.

Cheers,

-- Richard

On 22.11.2014, at 18:33, Srinivas Yerram <sr...@motivitylabs.com> wrote:

> 
> Dear Sir / Madam,
> 
> My core use cases are related to email data parsing, which are in different templates and in different languages. Which I need to extract useful information through UIMA annotators or any other plugin components. Scalability and clustering is high priority in my use case.
> 
> I would like to get clarification on apache UIMA framework as mentioned in below:
> 
> Whether UIMA framework annotators or any plug-in components will support for multi-language(like English,French,Arabic,Chinese etc) to parse the email contents ?
> 
> 
> Whether can I integrate Stanford NLP  libraries can be used as a plugin for apache UIMA framework components ?
> 
> 
> I will appreciate for any quick response on this. Thanks
> 
> Regards,
> Srinivas Yerram