You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by aliff faisal <ta...@yahoo.co.uk> on 2016/03/09 14:28:45 UTC

Using Apache UIMA for processing Malay texts

Hello!Sorry of my English - It's bad..
I would like to use Apache UIMA Annotators and other UIMA Tools for processing Malay language texts.. It's search of statistics term, dates, regions in text documents.

So, I would like to ask - what Annotators supports Malay language? 
or, Can u provide me some documentation or user guide for developing annotator that can process other language texts beside EnglishThank You
Your faithfully,
Aliff











Re: Using Apache UIMA for processing Malay texts

Posted by Richard Eckart de Castilho <re...@apache.org>.
The UIMA project itself only offers a handful of annotators and a number of them are language-agnostic, e.g. TikaAnnotator or ConceptMapper. UIMA Ruta is a rule-based processing engine which should allow you to write rules to extract information from Malay text.Some like the service-based ones (Alchemy, Calais) should support whatever languages the respective services support. The HMM Tagger comes with documentation on how to train it on your own data [1]. 

There are various third-party component collections for UIMA: ClearTK, DKPro Core, U-Compare, JCore - I am not aware that any of these has explicit support for Malay. But if you have e.g. trained your own OpenNLP or Stanford CoreNLP models for Malay or if you can find such models on the internet, you should be able to use them with the respective wrappers in the component collections mentioned above. Again, I am not aware of any freely available pre-trained models for Malay - but I never searched for them explicitly.

Best,

-- Richard

[1] https://uima.apache.org/d/uima-addons-current/Tagger/hmmTaggerUsersGuide.html

> On 09.03.2016, at 14:28, aliff faisal <ta...@yahoo.co.uk> wrote:
> 
> Hello!Sorry of my English - It's bad..
> I would like to use Apache UIMA Annotators and other UIMA Tools for processing Malay language texts.. It's search of statistics term, dates, regions in text documents.
> 
> So, I would like to ask - what Annotators supports Malay language? 
> or, Can u provide me some documentation or user guide for developing annotator that can process other language texts beside EnglishThank You
> Your faithfully,
> Aliff