You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Anuj Kumar Gupta <vi...@gmail.com> on 2009/02/02 10:08:37 UTC

Pass Multiple Docs.

How can we pass multiple docs as an input in UIMA?

Re: Pass Multiple Docs.

Posted by Thilo Goetz <tw...@gmx.de>.

Sharma, Kishor wrote:
> Hi Anuj,
> I was also trying to develop a text mining system using UIMA but couldn't succeed, I used vector space model of data mining for my project.
> If u will be able to do it please let me know ur appraoch? 

Check out ClearTK: http://code.google.com/p/cleartk/

--Thilo

RE: Pass Multiple Docs.

Posted by "Sharma, Kishor" <Ki...@deshaw.com>.

Hi Anuj,
I was also trying to develop a text mining system using UIMA but couldn't succeed, I used vector space model of data mining for my project.
If u will be able to do it please let me know ur appraoch? 

Thanks,
Kishor 

-----Original Message-----
From: Anuj Kumar Gupta [mailto:virgoanuj@gmail.com] 
Sent: Monday, February 02, 2009 2:56 PM
To: uima-user@incubator.apache.org
Subject: Re: Pass Multiple Docs.

Thilo-

I am working on a text Mining Project in which I need to create some
component like Classifier, POS tagging , Co referencing, Sentiment Analysis,
Negation Handling, Aggregation Handling.

Classifier à Classify the Input data as per some given words.
POS tagging à Add POS Tagging on that Data.
Co referencing à suppose there is a sentence like   "Arnold is a good
person. He is a Actor."  So in the 2nd sentence he would be co reff to
Arnold.
Sentiment Analysis à there would be some given list of words and also some
points for these words like 'good .5, bad .3 , ass .6'
So in Sentiment Analysis is would be find out is there any words matching
with those list or not.
And according to them show results. And score of the sentence. Like doe
above sentence .5

Negation Handling à "Arnold is a not good person. He is a Actor." Then score
would be -.5

Aggregation Handling à Show the Aggregation Score.

Something like this.

I want to use UIMA and GATE for this Project. And I am in very initial
state.
So please help me as much as Possible.

If any one has something similar to these component please share with me.

Thanks
Anuj.

On Mon, Feb 2, 2009 at 2:51 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Anuj Kumar Gupta wrote:
> > How can we pass multiple docs as an input in UIMA?
>
> Anuj, I think you would really benefit from working
> through some of the UIMA documentation.  Take a week
> or so, it'll be worth it in the end.
>
> I believe the question you really want to ask is:
> how can I create and drive a UIMA app programmatically?
> That's described here:  http://tinyurl.com/aysawn
> If that was not your question, you may want to give
> more details.
>
> --Thilo
>
>
>

Re: Pass Multiple Docs.

Posted by Matthias Wendt <ma...@neofonie.de>.

I guess you're starting the CPE from the bash and expecting the results 
to be printed on stdout.? This is not the case. You have to implement a 
cas consumer - a component that consumes the results, stores them, 
writes them to stdout or whatever you want it to do.

Have a look at the UIMA Tutorials and Dev. Guide 
<http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.cpe.cas_consumer.developing> 
for hints on developing and integrating a cas consumer.



Anuj Kumar Gupta schrieb:
> No ?
> How can I use Cas Consumer ?
>
>
>
> On Mon, Feb 2, 2009 at 3:29 PM, Matthias Wendt
> <ma...@neofonie.de>wrote:
>
>   
>> Hi,
>>
>> does your pipeline contain a cas consumer?
>>
>> Regards
>> Matthias
>>
>>
>> Anuj Kumar Gupta schrieb:
>>
>> I am also trying with these two but not getting the result.
>>     
>>> Please help me nore.
>>>
>>> On Mon, Feb 2, 2009 at 3:20 PM, Tommaso Teofili
>>> <to...@gmail.com>wrote:
>>>
>>>
>>>
>>>       
>>>> I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
>>>> Whitespace Tokenizer and (HMM) Tagger.
>>>>
>>>> Tommaso
>>>>
>>>> 2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>
>>>>
>>>>
>>>>
>>>>         
>>>>> I have already checkout UIMA Sandbox annotators.
>>>>> but I am not able to ru POS tagging.
>>>>>
>>>>> can you please let me nkow the process.??
>>>>>
>>>>> On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Anuj Kumar Gupta wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Thilo-
>>>>>>>
>>>>>>> I am working on a text Mining Project in which I need to create some
>>>>>>> component like Classifier, POS tagging , Co referencing, Sentiment
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Analysis,
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Negation Handling, Aggregation Handling.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> [...]
>>>>>>
>>>>>> Yes, I think you have your hands full.  That's
>>>>>> a lot of work.  I don't know what you need this
>>>>>> for, but there are companies out there making
>>>>>> money with that kind of analysis.
>>>>>>
>>>>>> You'll find POS tagging in the UIMA sandbox,
>>>>>> and there are any number of open source classifiers
>>>>>> out there.  Coreference resolution and sentiment
>>>>>> analysis is another matter.  I don't know of any
>>>>>> open source components for those, but maybe
>>>>>> someone else does.
>>>>>>
>>>>>> --Thilo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>       
>> --
>> --------------------------------
>> Matthias Wendt
>> Junior Softwareentwickler
>> F&E
>>
>> neofonie
>> Technologieentwicklung und
>> Informationsmanagement GmbH
>> Robert-Koch-Platz 4
>> 10115 Berlin
>> fon: +49.30 24627 529
>> fax: +49.30 24627 120
>> matthias.wendt@neofonie.de
>> http://www.neofonie.de
>> Handelsregister
>> Berlin-Charlottenburg: HRB 67460
>>
>> Geschaeftsfuehrung
>> Helmut Hoffer von Ankershoffen
>> Nurhan Yildirim
>> Uwe-Gernot Fasold
>> --------------------------------
>>
>>
>>     
>
>   


-- 
--------------------------------
Matthias Wendt
Junior Softwareentwickler
F&E

neofonie
Technologieentwicklung und
Informationsmanagement GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 529
fax: +49.30 24627 120
matthias.wendt@neofonie.de
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
Nurhan Yildirim
Uwe-Gernot Fasold
--------------------------------

Re: Pass Multiple Docs.

Posted by Anuj Kumar Gupta <vi...@gmail.com>.

No ?
How can I use Cas Consumer ?



On Mon, Feb 2, 2009 at 3:29 PM, Matthias Wendt
<ma...@neofonie.de>wrote:

> Hi,
>
> does your pipeline contain a cas consumer?
>
> Regards
> Matthias
>
>
> Anuj Kumar Gupta schrieb:
>
> I am also trying with these two but not getting the result.
>> Please help me nore.
>>
>> On Mon, Feb 2, 2009 at 3:20 PM, Tommaso Teofili
>> <to...@gmail.com>wrote:
>>
>>
>>
>>> I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
>>> Whitespace Tokenizer and (HMM) Tagger.
>>>
>>> Tommaso
>>>
>>> 2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>
>>>
>>>
>>>
>>>> I have already checkout UIMA Sandbox annotators.
>>>> but I am not able to ru POS tagging.
>>>>
>>>> can you please let me nkow the process.??
>>>>
>>>> On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>
>>>>
>>>>
>>>>> Anuj Kumar Gupta wrote:
>>>>>
>>>>>
>>>>>> Thilo-
>>>>>>
>>>>>> I am working on a text Mining Project in which I need to create some
>>>>>> component like Classifier, POS tagging , Co referencing, Sentiment
>>>>>>
>>>>>>
>>>>> Analysis,
>>>>>
>>>>>
>>>>>> Negation Handling, Aggregation Handling.
>>>>>>
>>>>>>
>>>>> [...]
>>>>>
>>>>> Yes, I think you have your hands full.  That's
>>>>> a lot of work.  I don't know what you need this
>>>>> for, but there are companies out there making
>>>>> money with that kind of analysis.
>>>>>
>>>>> You'll find POS tagging in the UIMA sandbox,
>>>>> and there are any number of open source classifiers
>>>>> out there.  Coreference resolution and sentiment
>>>>> analysis is another matter.  I don't know of any
>>>>> open source components for those, but maybe
>>>>> someone else does.
>>>>>
>>>>> --Thilo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>
>
> --
> --------------------------------
> Matthias Wendt
> Junior Softwareentwickler
> F&E
>
> neofonie
> Technologieentwicklung und
> Informationsmanagement GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> fon: +49.30 24627 529
> fax: +49.30 24627 120
> matthias.wendt@neofonie.de
> http://www.neofonie.de
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
>
> Geschaeftsfuehrung
> Helmut Hoffer von Ankershoffen
> Nurhan Yildirim
> Uwe-Gernot Fasold
> --------------------------------
>
>

Re: Pass Multiple Docs.

Posted by Matthias Wendt <ma...@neofonie.de>.

Hi,

does your pipeline contain a cas consumer?

Regards
Matthias


Anuj Kumar Gupta schrieb:
> I am also trying with these two but not getting the result.
> Please help me nore.
>
> On Mon, Feb 2, 2009 at 3:20 PM, Tommaso Teofili
> <to...@gmail.com>wrote:
>
>   
>> I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
>> Whitespace Tokenizer and (HMM) Tagger.
>>
>> Tommaso
>>
>> 2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>
>>
>>     
>>> I have already checkout UIMA Sandbox annotators.
>>> but I am not able to ru POS tagging.
>>>
>>> can you please let me nkow the process.??
>>>
>>> On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>       
>>>> Anuj Kumar Gupta wrote:
>>>>         
>>>>> Thilo-
>>>>>
>>>>> I am working on a text Mining Project in which I need to create some
>>>>> component like Classifier, POS tagging , Co referencing, Sentiment
>>>>>           
>>>> Analysis,
>>>>         
>>>>> Negation Handling, Aggregation Handling.
>>>>>           
>>>> [...]
>>>>
>>>> Yes, I think you have your hands full.  That's
>>>> a lot of work.  I don't know what you need this
>>>> for, but there are companies out there making
>>>> money with that kind of analysis.
>>>>
>>>> You'll find POS tagging in the UIMA sandbox,
>>>> and there are any number of open source classifiers
>>>> out there.  Coreference resolution and sentiment
>>>> analysis is another matter.  I don't know of any
>>>> open source components for those, but maybe
>>>> someone else does.
>>>>
>>>> --Thilo
>>>>
>>>>
>>>>         
>
>   


-- 
--------------------------------
Matthias Wendt
Junior Softwareentwickler
F&E

neofonie
Technologieentwicklung und
Informationsmanagement GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 529
fax: +49.30 24627 120
matthias.wendt@neofonie.de
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
Nurhan Yildirim
Uwe-Gernot Fasold
--------------------------------

Re: Pass Multiple Docs.

Posted by Tommaso Teofili <to...@gmail.com>.

You simply have to pipeline the Whitespace Tokenizer and the Tagger
annotators in an Aggregate AE, you will get the POS tagging as a property of
the TokenAnnotation and not as an Annotation itself.

Tommaso

2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>

> I am also trying with these two but not getting the result.
> Please help me nore.
>
> On Mon, Feb 2, 2009 at 3:20 PM, Tommaso Teofili
> <to...@gmail.com>wrote:
>
> > I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
> > Whitespace Tokenizer and (HMM) Tagger.
> >
> > Tommaso
> >
> > 2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>
> >
> > > I have already checkout UIMA Sandbox annotators.
> > > but I am not able to ru POS tagging.
> > >
> > > can you please let me nkow the process.??
> > >
> > > On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
> > >
> > > > Anuj Kumar Gupta wrote:
> > > > > Thilo-
> > > > >
> > > > > I am working on a text Mining Project in which I need to create
> some
> > > > > component like Classifier, POS tagging , Co referencing, Sentiment
> > > > Analysis,
> > > > > Negation Handling, Aggregation Handling.
> > > > [...]
> > > >
> > > > Yes, I think you have your hands full.  That's
> > > > a lot of work.  I don't know what you need this
> > > > for, but there are companies out there making
> > > > money with that kind of analysis.
> > > >
> > > > You'll find POS tagging in the UIMA sandbox,
> > > > and there are any number of open source classifiers
> > > > out there.  Coreference resolution and sentiment
> > > > analysis is another matter.  I don't know of any
> > > > open source components for those, but maybe
> > > > someone else does.
> > > >
> > > > --Thilo
> > > >
> > > >
> > >
> >
>

Re: Pass Multiple Docs.

Posted by Anuj Kumar Gupta <vi...@gmail.com>.

I am also trying with these two but not getting the result.
Please help me nore.

On Mon, Feb 2, 2009 at 3:20 PM, Tommaso Teofili
<to...@gmail.com>wrote:

> I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
> Whitespace Tokenizer and (HMM) Tagger.
>
> Tommaso
>
> 2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>
>
> > I have already checkout UIMA Sandbox annotators.
> > but I am not able to ru POS tagging.
> >
> > can you please let me nkow the process.??
> >
> > On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
> >
> > > Anuj Kumar Gupta wrote:
> > > > Thilo-
> > > >
> > > > I am working on a text Mining Project in which I need to create some
> > > > component like Classifier, POS tagging , Co referencing, Sentiment
> > > Analysis,
> > > > Negation Handling, Aggregation Handling.
> > > [...]
> > >
> > > Yes, I think you have your hands full.  That's
> > > a lot of work.  I don't know what you need this
> > > for, but there are companies out there making
> > > money with that kind of analysis.
> > >
> > > You'll find POS tagging in the UIMA sandbox,
> > > and there are any number of open source classifiers
> > > out there.  Coreference resolution and sentiment
> > > analysis is another matter.  I don't know of any
> > > open source components for those, but maybe
> > > someone else does.
> > >
> > > --Thilo
> > >
> > >
> >
>

Re: Pass Multiple Docs.

Posted by Tommaso Teofili <to...@gmail.com>.

I succeeded in POS tagging in UIMA via an aggregate AE of (in this order)
Whitespace Tokenizer and (HMM) Tagger.

Tommaso

2009/2/2 Anuj Kumar Gupta <vi...@gmail.com>

> I have already checkout UIMA Sandbox annotators.
> but I am not able to ru POS tagging.
>
> can you please let me nkow the process.??
>
> On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:
>
> > Anuj Kumar Gupta wrote:
> > > Thilo-
> > >
> > > I am working on a text Mining Project in which I need to create some
> > > component like Classifier, POS tagging , Co referencing, Sentiment
> > Analysis,
> > > Negation Handling, Aggregation Handling.
> > [...]
> >
> > Yes, I think you have your hands full.  That's
> > a lot of work.  I don't know what you need this
> > for, but there are companies out there making
> > money with that kind of analysis.
> >
> > You'll find POS tagging in the UIMA sandbox,
> > and there are any number of open source classifiers
> > out there.  Coreference resolution and sentiment
> > analysis is another matter.  I don't know of any
> > open source components for those, but maybe
> > someone else does.
> >
> > --Thilo
> >
> >
>

Re: Pass Multiple Docs.

Posted by Anuj Kumar Gupta <vi...@gmail.com>.

I have already checkout UIMA Sandbox annotators.
but I am not able to ru POS tagging.

can you please let me nkow the process.??

On Mon, Feb 2, 2009 at 3:09 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Anuj Kumar Gupta wrote:
> > Thilo-
> >
> > I am working on a text Mining Project in which I need to create some
> > component like Classifier, POS tagging , Co referencing, Sentiment
> Analysis,
> > Negation Handling, Aggregation Handling.
> [...]
>
> Yes, I think you have your hands full.  That's
> a lot of work.  I don't know what you need this
> for, but there are companies out there making
> money with that kind of analysis.
>
> You'll find POS tagging in the UIMA sandbox,
> and there are any number of open source classifiers
> out there.  Coreference resolution and sentiment
> analysis is another matter.  I don't know of any
> open source components for those, but maybe
> someone else does.
>
> --Thilo
>
>

Re: Pass Multiple Docs.

Posted by Jörn Kottmann <ko...@gmail.com>.

> You'll find POS tagging in the UIMA sandbox,
> and there are any number of open source classifiers
> out there.  Coreference resolution and sentiment
> analysis is another matter.  I don't know of any
> open source components for those, but maybe
> someone else does.


The OpenNLP tools provide you with most of the tools you
need. It has sentence detector, tokenizer, pos tagger,
chunker, parser and coreference resolution.

It is licensed currently under LGPL, but this will be changed for the  
next
major release to ASL to ease integration with apache projects.

You can find more information about it at the project website:

http://opennlp.sourceforge.net/

Jörn

Re: Pass Multiple Docs.

Posted by Thilo Goetz <tw...@gmx.de>.

Anuj Kumar Gupta wrote:
> Thilo-
> 
> I am working on a text Mining Project in which I need to create some
> component like Classifier, POS tagging , Co referencing, Sentiment Analysis,
> Negation Handling, Aggregation Handling.
[...]

Yes, I think you have your hands full.  That's
a lot of work.  I don't know what you need this
for, but there are companies out there making
money with that kind of analysis.

You'll find POS tagging in the UIMA sandbox,
and there are any number of open source classifiers
out there.  Coreference resolution and sentiment
analysis is another matter.  I don't know of any
open source components for those, but maybe
someone else does.

--Thilo

Re: Pass Multiple Docs.

Posted by Anuj Kumar Gupta <vi...@gmail.com>.

Thilo-

I am working on a text Mining Project in which I need to create some
component like Classifier, POS tagging , Co referencing, Sentiment Analysis,
Negation Handling, Aggregation Handling.

Classifier à Classify the Input data as per some given words.
POS tagging à Add POS Tagging on that Data.
Co referencing à suppose there is a sentence like   "Arnold is a good
person. He is a Actor."  So in the 2nd sentence he would be co reff to
Arnold.
Sentiment Analysis à there would be some given list of words and also some
points for these words like 'good .5, bad .3 , ass .6'
So in Sentiment Analysis is would be find out is there any words matching
with those list or not.
And according to them show results. And score of the sentence. Like doe
above sentence .5

Negation Handling à "Arnold is a not good person. He is a Actor." Then score
would be -.5

Aggregation Handling à Show the Aggregation Score.

Something like this.

I want to use UIMA and GATE for this Project. And I am in very initial
state.
So please help me as much as Possible.

If any one has something similar to these component please share with me.

Thanks
Anuj.

On Mon, Feb 2, 2009 at 2:51 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Anuj Kumar Gupta wrote:
> > How can we pass multiple docs as an input in UIMA?
>
> Anuj, I think you would really benefit from working
> through some of the UIMA documentation.  Take a week
> or so, it'll be worth it in the end.
>
> I believe the question you really want to ask is:
> how can I create and drive a UIMA app programmatically?
> That's described here:  http://tinyurl.com/aysawn
> If that was not your question, you may want to give
> more details.
>
> --Thilo
>
>
>

Re: Pass Multiple Docs.

Posted by Thilo Goetz <tw...@gmx.de>.

Anuj Kumar Gupta wrote:
> How can we pass multiple docs as an input in UIMA?

Anuj, I think you would really benefit from working
through some of the UIMA documentation.  Take a week
or so, it'll be worth it in the end.

I believe the question you really want to ask is:
how can I create and drive a UIMA app programmatically?
That's described here:  http://tinyurl.com/aysawn
If that was not your question, you may want to give
more details.

--Thilo

Re: Pass Multiple Docs.

Posted by Jörn Kottmann <ko...@gmail.com>.

> How can we pass multiple docs as an input in UIMA?


UIMA obtains documents e.g. from a database through a Collection Reader.
In your case this means that you have to implement your own Collection  
Reader
which can access your specific database. The UIMA examples contains a  
sample
which shows how to implement a Collection Reader which reads documents
from the file system. Have a look at the code.

You can find more about Collection Reader in the UIMA documentation:
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.cpe.collection_reader.developing

Hope this helps,
Jörn