You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Torsten Zesch <ze...@ukp.informatik.tu-darmstadt.de> on 2012/07/12 09:20:50 UTC

RE: Using Apache UIMA for processing russian texts

Redirected the request to UIMA userlist ...

Hi Alexander,

In addition to what you have already found, the DKPro Core Framework
http://code.google.com/p/dkpro-core-asl/
has a POS Tagger (TreeTagger) that comes with a Russian model.

I am not aware of Russian components for detecting dates, regions etc.

-Torsten

> -----Original Message-----
> From: Александр Крылов [mailto:qblook@gmail.com]
> Sent: Wednesday, July 11, 2012 11:17 AM
> To: dev@uima.apache.org
> Subject: Using Apache UIMA for processing russian texts
> 
> Hello!
> 
> Sorry of my English - It's bad..
> I would like to use Apache UIMA Annotators and other UIMA Tools for
> processing russian language texts.. It's search of statistircs term, dates,
> regions in text documents.
> In examples I found only english (and some other) languages, but no russian.
> But on Apache UIMA seb site written that Showball Annotator supports the
> russian language.
> So, I would like to ask - what Annotators supports russian language? Can I use
> external russian morphology systems in Annotators, created by using Apache
> UIMA?
> 
> Thank You
> Your faithfully,
> Alexander

RE: Using Apache UIMA for processing russian texts

Posted by Thomas Plagwitz <th...@hotmail.com>.
Hi, 

Freeling seems to have some of the features that you are looking for in Russian:

Tokenization
Sentence splitting
Number detection
Date detection
Morphological dictionary
Basic named entity detection
Quantity detection
PoS tagging

Thanks, 
Thomas 
---------------------------------------------------------------------------
Dr. Thomas Plagwitz |Director, Language Resource Center 
UNC Charlotte | Dept. of Languages and Culture Studies
9201 University City Blvd. | Charlotte, NC 28223
Phone: 704-687-8762 | Fax: 704-687-3496  
http://plagwitz.org | http://lrc.uncc.edu
---------------------------------------------------------------------------
If you are not the intended recipient of this transmission or a person responsible for delivering it to the intended recipient, any  disclosure, copying, distribution, or other use of any of the information in this transmission is strictly prohibited. If you have received this transmission in error, please notify me immediately by reply e-mail or by telephone at 704-687-8762. Thank you. 

-----Original Message-----
From: Александр Крылов [mailto:qblook@gmail.com] 
Sent: Friday, July 13, 2012 5:53 AM
To: Torsten Zesch
Cc: user@uima.apache.org
Subject: Re: Using Apache UIMA for processing russian texts

ok, tnank You for Your answer!

So, I will see DKPro Core Framework today,

And also i would like to ask You -- can i use external resources/libraries/api (etc) in my annotators? (It's may be keywords and entity extractors, filters, rubricators, russian morphology, detecrots,
etc) - i have this libraties (example: aot.ru - the Alexey Sokirko's morphology projects -- greatest russian morphology) But hight level of this project will be Apache UIMA. (All my logic -- incapsulated in Annotators, written by me). It's possible?

You faithfully, Alexander


2012/7/12 Torsten Zesch <ze...@ukp.informatik.tu-darmstadt.de>

> Redirected the request to UIMA userlist ...
>
> Hi Alexander,
>
> In addition to what you have already found, the DKPro Core Framework 
> http://code.google.com/p/dkpro-core-asl/
> has a POS Tagger (TreeTagger) that comes with a Russian model.
>
> I am not aware of Russian components for detecting dates, regions etc.
>
> -Torsten
>
> > -----Original Message-----
> > From: Александр Крылов [mailto:qblook@gmail.com]
> > Sent: Wednesday, July 11, 2012 11:17 AM
> > To: dev@uima.apache.org
> > Subject: Using Apache UIMA for processing russian texts
> >
> > Hello!
> >
> > Sorry of my English - It's bad..
> > I would like to use Apache UIMA Annotators and other UIMA Tools for 
> > processing russian language texts.. It's search of statistircs term,
> dates,
> > regions in text documents.
> > In examples I found only english (and some other) languages, but no
> russian.
> > But on Apache UIMA seb site written that Showball Annotator supports 
> > the russian language.
> > So, I would like to ask - what Annotators supports russian language? 
> > Can
> I use
> > external russian morphology systems in Annotators, created by using
> Apache
> > UIMA?
> >
> > Thank You
> > Your faithfully,
> > Alexander
>


Re: Using Apache UIMA for processing russian texts

Posted by Barbara Plank <bp...@gmail.com>.

On Jul 13, 2012, at 9:29 PM, Marshall Schor <ms...@schor.com> wrote:

> yes, this is a commonly done thing.
> 
> The extermal resources can be loaded once and shared across multiple annotators,
> for instance. You may read more about this here:
> 
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files
> 
> 
> On 7/13/2012 5:52 AM, Александр Крылов wrote:
>> ok, tnank You for Your answer!
>> 
>> So, I will see DKPro Core Framework today,
>> 
>> And also i would like to ask You -- can i use external
>> resources/libraries/api (etc) in my annotators? (It's may be keywords and
>> entity extractors, filters, rubricators, russian morphology, detecrots,
>> etc) - i have this libraties (example: aot.ru - the Alexey Sokirko's
>> morphology projects -- greatest russian morphology)
>> But hight level of this project will be Apache UIMA. (All my logic --
>> incapsulated in Annotators, written by me). It's possible?
>> 
>> You faithfully, Alexander
>> 
>> 
>> 2012/7/12 Torsten Zesch <ze...@ukp.informatik.tu-darmstadt.de>
>> 
>>> Redirected the request to UIMA userlist ...
>>> 
>>> Hi Alexander,
>>> 
>>> In addition to what you have already found, the DKPro Core Framework
>>> http://code.google.com/p/dkpro-core-asl/
>>> has a POS Tagger (TreeTagger) that comes with a Russian model.
>>> 
>>> I am not aware of Russian components for detecting dates, regions etc.
>>> 
>>> -Torsten
>>> 
>>>> -----Original Message-----
>>>> From: Александр Крылов [mailto:qblook@gmail.com]
>>>> Sent: Wednesday, July 11, 2012 11:17 AM
>>>> To: dev@uima.apache.org
>>>> Subject: Using Apache UIMA for processing russian texts
>>>> 
>>>> Hello!
>>>> 
>>>> Sorry of my English - It's bad..
>>>> I would like to use Apache UIMA Annotators and other UIMA Tools for
>>>> processing russian language texts.. It's search of statistircs term,
>>> dates,
>>>> regions in text documents.
>>>> In examples I found only english (and some other) languages, but no
>>> russian.
>>>> But on Apache UIMA seb site written that Showball Annotator supports the
>>>> russian language.
>>>> So, I would like to ask - what Annotators supports russian language? Can
>>> I use
>>>> external russian morphology systems in Annotators, created by using
>>> Apache
>>>> UIMA?
>>>> 
>>>> Thank You
>>>> Your faithfully,
>>>> Alexander
> 
> 

Re: Using Apache UIMA for processing russian texts

Posted by Marshall Schor <ms...@schor.com>.
yes, this is a commonly done thing.

The extermal resources can be loaded once and shared across multiple annotators,
for instance. You may read more about this here:

http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files


On 7/13/2012 5:52 AM, Александр Крылов wrote:
> ok, tnank You for Your answer!
>
> So, I will see DKPro Core Framework today,
>
> And also i would like to ask You -- can i use external
> resources/libraries/api (etc) in my annotators? (It's may be keywords and
> entity extractors, filters, rubricators, russian morphology, detecrots,
> etc) - i have this libraties (example: aot.ru - the Alexey Sokirko's
> morphology projects -- greatest russian morphology)
> But hight level of this project will be Apache UIMA. (All my logic --
> incapsulated in Annotators, written by me). It's possible?
>
> You faithfully, Alexander
>
>
> 2012/7/12 Torsten Zesch <ze...@ukp.informatik.tu-darmstadt.de>
>
>> Redirected the request to UIMA userlist ...
>>
>> Hi Alexander,
>>
>> In addition to what you have already found, the DKPro Core Framework
>> http://code.google.com/p/dkpro-core-asl/
>> has a POS Tagger (TreeTagger) that comes with a Russian model.
>>
>> I am not aware of Russian components for detecting dates, regions etc.
>>
>> -Torsten
>>
>>> -----Original Message-----
>>> From: Александр Крылов [mailto:qblook@gmail.com]
>>> Sent: Wednesday, July 11, 2012 11:17 AM
>>> To: dev@uima.apache.org
>>> Subject: Using Apache UIMA for processing russian texts
>>>
>>> Hello!
>>>
>>> Sorry of my English - It's bad..
>>> I would like to use Apache UIMA Annotators and other UIMA Tools for
>>> processing russian language texts.. It's search of statistircs term,
>> dates,
>>> regions in text documents.
>>> In examples I found only english (and some other) languages, but no
>> russian.
>>> But on Apache UIMA seb site written that Showball Annotator supports the
>>> russian language.
>>> So, I would like to ask - what Annotators supports russian language? Can
>> I use
>>> external russian morphology systems in Annotators, created by using
>> Apache
>>> UIMA?
>>>
>>> Thank You
>>> Your faithfully,
>>> Alexander



Re: Using Apache UIMA for processing russian texts

Posted by Александр Крылов <qb...@gmail.com>.
ok, tnank You for Your answer!

So, I will see DKPro Core Framework today,

And also i would like to ask You -- can i use external
resources/libraries/api (etc) in my annotators? (It's may be keywords and
entity extractors, filters, rubricators, russian morphology, detecrots,
etc) - i have this libraties (example: aot.ru - the Alexey Sokirko's
morphology projects -- greatest russian morphology)
But hight level of this project will be Apache UIMA. (All my logic --
incapsulated in Annotators, written by me). It's possible?

You faithfully, Alexander


2012/7/12 Torsten Zesch <ze...@ukp.informatik.tu-darmstadt.de>

> Redirected the request to UIMA userlist ...
>
> Hi Alexander,
>
> In addition to what you have already found, the DKPro Core Framework
> http://code.google.com/p/dkpro-core-asl/
> has a POS Tagger (TreeTagger) that comes with a Russian model.
>
> I am not aware of Russian components for detecting dates, regions etc.
>
> -Torsten
>
> > -----Original Message-----
> > From: Александр Крылов [mailto:qblook@gmail.com]
> > Sent: Wednesday, July 11, 2012 11:17 AM
> > To: dev@uima.apache.org
> > Subject: Using Apache UIMA for processing russian texts
> >
> > Hello!
> >
> > Sorry of my English - It's bad..
> > I would like to use Apache UIMA Annotators and other UIMA Tools for
> > processing russian language texts.. It's search of statistircs term,
> dates,
> > regions in text documents.
> > In examples I found only english (and some other) languages, but no
> russian.
> > But on Apache UIMA seb site written that Showball Annotator supports the
> > russian language.
> > So, I would like to ask - what Annotators supports russian language? Can
> I use
> > external russian morphology systems in Annotators, created by using
> Apache
> > UIMA?
> >
> > Thank You
> > Your faithfully,
> > Alexander
>