You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jagat <ja...@gmail.com> on 2012/04/25 18:47:08 UTC

Re: Text Analysis

There are Api which you can use , offcourse they are third party.

-----------
Sent from Mobile , short and crisp.
On 25-Apr-2012 8:57 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:

> Hadoop itself is the core Map/Reduce and HDFS functionality.  The higher
> level algorithms like sentiment analysis are often done by others.
>  Cloudera has a video from HadoopWorld 2010 about it
>
>
> http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered_by_hadoop/
>
> And there are likely to be other tools like R that can help you out with
> it.  I am not really sure if mahout offers sentiment analysis or not, but
> you might want to look there too http://mahout.apache.org/
>
> --Bobby Evans
>
>
> On 4/25/12 7:50 AM, "karanveer.singh@barclays.com" <
> karanveer.singh@barclays.com> wrote:
>
> Hi,
>
> I wanted to know if there are any existing API's within Hadoop for us to
> do some text analysis like sentiment analysis, etc. OR are we to rely on
> tools like R, etc. for this.
>
>
> Regards,
> Karanveer
>
>
>
>
>
> This e-mail and any attachments are confidential and intended
> solely for the addressee and may also be privileged or exempt from
> disclosure under applicable law. If you are not the addressee, or
> have received this e-mail in error, please notify the sender
> immediately, delete it from your system and do not copy, disclose
> or otherwise act upon any part of this e-mail or its attachments.
>
> Internet communications are not guaranteed to be secure or
> virus-free.
> The Barclays Group does not accept responsibility for any loss
> arising from unauthorised access to, or interference with, any
> Internet communications by any third party, or from the
> transmission of any viruses. Replies to this e-mail may be
> monitored by the Barclays Group for operational or business
> reasons.
>
> Any opinion or other information in this e-mail or its attachments
> that does not relate to the business of the Barclays Group is
> personal to the sender and is not given or endorsed by the Barclays
> Group.
>
> Barclays Bank PLC. Registered in England and Wales (registered no.
> 1026167).
> Registered Office: 1 Churchill Place, London, E14 5HP, United
> Kingdom.
>
> Barclays Bank PLC is authorised and regulated by the Financial
> Services Authority.
>
>

Re: Text Analysis

Posted by praveenesh kumar <pr...@gmail.com>.
Rhive uses Hive Thrift server to connect with Hive. You can execute hive
queries and get results back into R data frames. and then play around with
it using R libraries. Its pretty interesting project, given that you have
Hive setup on top of hadoop.

Regards,
Praveenesh

On Thu, Apr 26, 2012 at 1:32 PM, Guillaume Polaert <gp...@cyres.fr>wrote:

> Hello,
>
> Yesterday, I've discovered RHive project. It use R-server on each datanode.
> Does Somebody tried it ?
>
> Guillaume
>
> -----Message d'origine-----
> De : Devi Kumarappan [mailto:kpalania@att.net]
> Envoyé : mercredi 25 avril 2012 22:56
> À : common-user@hadoop.apache.org
> Objet : Re: Text Analysis
>
> RHaddop package allows you to do statistical anlysis.  we were able to do
> word cloud on the text files using rmr and rhdfs packages.
>
> Installtion details for these packages is available in the following link.
>
> https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr
>
> Devi
>
>
>
> ________________________________
> From: Charles Earl <ch...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Wed, April 25, 2012 12:20:36 PM
> Subject: Re: Text Analysis
>
> If you've got existing R code, you might want to look at this
> http://www.quora.com/How-can-R-and-Hadoop-be-used-together.
> Quora posting, also by Cloudera, or the rhipe R Hadoop package
> https://github.com/saptarshiguha/RHIPE/wiki
> Mahout and Lucene/Solr offer some level of text analysis, although I would
> not call these complete text analysis packages.
> What I've found are specific algorithms as opposed to a complete package:
> for example LDA for topic discovery -- Mahout and Yahoo Research
> (https://github.com/shravanmn/Yahoo_LDA) have Hadoop based
> implementations -- in the case of Yahoo_LDA the data is stored in HDFS,
> while the computation is essentially MPI based. Whether the algorithm reads
> data from HDFS store and uses another approach other than map reduce is
> another question.
> C
>
> On Apr 25, 2012, at 12:47 PM, Jagat wrote:
>
> > There are Api which you can use , offcourse they are third party.
> >
> > -----------
> > Sent from Mobile , short and crisp.
> > On 25-Apr-2012 8:57 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >
> >> Hadoop itself is the core Map/Reduce and HDFS functionality.  The
> >> higher level algorithms like sentiment analysis are often done by
> others.
> >> Cloudera has a video from HadoopWorld 2010 about it
> >>
> >>
> >>http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered
> >>_by_hadoop/
> >>/
> >>
> >> And there are likely to be other tools like R that can help you out
> >> with it.  I am not really sure if mahout offers sentiment analysis or
> >> not, but you might want to look there too http://mahout.apache.org/
> >>
> >> --Bobby Evans
> >>
> >>
> >> On 4/25/12 7:50 AM, "karanveer.singh@barclays.com" <
> >> karanveer.singh@barclays.com> wrote:
> >>
> >> Hi,
> >>
> >> I wanted to know if there are any existing API's within Hadoop for us
> >> to do some text analysis like sentiment analysis, etc. OR are we to
> >> rely on tools like R, etc. for this.
> >>
> >>
> >> Regards,
> >> Karanveer
> >>
> >>
> >>
> >>
> >>
> >> This e-mail and any attachments are confidential and intended solely
> >> for the addressee and may also be privileged or exempt from
> >> disclosure under applicable law. If you are not the addressee, or
> >> have received this e-mail in error, please notify the sender
> >> immediately, delete it from your system and do not copy, disclose or
> >> otherwise act upon any part of this e-mail or its attachments.
> >>
> >> Internet communications are not guaranteed to be secure or
> >> virus-free.
> >> The Barclays Group does not accept responsibility for any loss
> >> arising from unauthorised access to, or interference with, any
> >> Internet communications by any third party, or from the transmission
> >> of any viruses. Replies to this e-mail may be monitored by the
> >> Barclays Group for operational or business reasons.
> >>
> >> Any opinion or other information in this e-mail or its attachments
> >> that does not relate to the business of the Barclays Group is
> >> personal to the sender and is not given or endorsed by the Barclays
> >> Group.
> >>
> >> Barclays Bank PLC. Registered in England and Wales (registered no.
> >> 1026167).
> >> Registered Office: 1 Churchill Place, London, E14 5HP, United
> >> Kingdom.
> >>
> >> Barclays Bank PLC is authorised and regulated by the Financial
> >> Services Authority.
> >>
> >>
>

RE: Text Analysis

Posted by Guillaume Polaert <gp...@cyres.fr>.
Hello,

Yesterday, I've discovered RHive project. It use R-server on each datanode.
Does Somebody tried it ?

Guillaume 

-----Message d'origine-----
De : Devi Kumarappan [mailto:kpalania@att.net] 
Envoyé : mercredi 25 avril 2012 22:56
À : common-user@hadoop.apache.org
Objet : Re: Text Analysis

RHaddop package allows you to do statistical anlysis.  we were able to do word cloud on the text files using rmr and rhdfs packages.

Installtion details for these packages is available in the following link.

https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr

Devi



________________________________
From: Charles Earl <ch...@gmail.com>
To: common-user@hadoop.apache.org
Sent: Wed, April 25, 2012 12:20:36 PM
Subject: Re: Text Analysis

If you've got existing R code, you might want to look at this http://www.quora.com/How-can-R-and-Hadoop-be-used-together.
Quora posting, also by Cloudera, or the rhipe R Hadoop package https://github.com/saptarshiguha/RHIPE/wiki
Mahout and Lucene/Solr offer some level of text analysis, although I would not call these complete text analysis packages.
What I've found are specific algorithms as opposed to a complete package: for example LDA for topic discovery -- Mahout and Yahoo Research
(https://github.com/shravanmn/Yahoo_LDA) have Hadoop based implementations -- in the case of Yahoo_LDA the data is stored in HDFS, while the computation is essentially MPI based. Whether the algorithm reads data from HDFS store and uses another approach other than map reduce is another question.
C

On Apr 25, 2012, at 12:47 PM, Jagat wrote:

> There are Api which you can use , offcourse they are third party.
> 
> -----------
> Sent from Mobile , short and crisp.
> On 25-Apr-2012 8:57 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> 
>> Hadoop itself is the core Map/Reduce and HDFS functionality.  The 
>> higher level algorithms like sentiment analysis are often done by others.
>> Cloudera has a video from HadoopWorld 2010 about it
>> 
>> 
>>http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered
>>_by_hadoop/
>>/
>> 
>> And there are likely to be other tools like R that can help you out 
>> with it.  I am not really sure if mahout offers sentiment analysis or 
>> not, but you might want to look there too http://mahout.apache.org/
>> 
>> --Bobby Evans
>> 
>> 
>> On 4/25/12 7:50 AM, "karanveer.singh@barclays.com" < 
>> karanveer.singh@barclays.com> wrote:
>> 
>> Hi,
>> 
>> I wanted to know if there are any existing API's within Hadoop for us 
>> to do some text analysis like sentiment analysis, etc. OR are we to 
>> rely on tools like R, etc. for this.
>> 
>> 
>> Regards,
>> Karanveer
>> 
>> 
>> 
>> 
>> 
>> This e-mail and any attachments are confidential and intended solely 
>> for the addressee and may also be privileged or exempt from 
>> disclosure under applicable law. If you are not the addressee, or 
>> have received this e-mail in error, please notify the sender 
>> immediately, delete it from your system and do not copy, disclose or 
>> otherwise act upon any part of this e-mail or its attachments.
>> 
>> Internet communications are not guaranteed to be secure or 
>> virus-free.
>> The Barclays Group does not accept responsibility for any loss 
>> arising from unauthorised access to, or interference with, any 
>> Internet communications by any third party, or from the transmission 
>> of any viruses. Replies to this e-mail may be monitored by the 
>> Barclays Group for operational or business reasons.
>> 
>> Any opinion or other information in this e-mail or its attachments 
>> that does not relate to the business of the Barclays Group is 
>> personal to the sender and is not given or endorsed by the Barclays 
>> Group.
>> 
>> Barclays Bank PLC. Registered in England and Wales (registered no.
>> 1026167).
>> Registered Office: 1 Churchill Place, London, E14 5HP, United 
>> Kingdom.
>> 
>> Barclays Bank PLC is authorised and regulated by the Financial 
>> Services Authority.
>> 
>> 

Re: Text Analysis

Posted by Devi Kumarappan <kp...@att.net>.
RHaddop package allows you to do statistical anlysis.  we were able to do word 
cloud on the text files using rmr and rhdfs packages.

Installtion details for these packages is available in the following link.

https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr

Devi



________________________________
From: Charles Earl <ch...@gmail.com>
To: common-user@hadoop.apache.org
Sent: Wed, April 25, 2012 12:20:36 PM
Subject: Re: Text Analysis

If you've got existing R code, you might want to look at this 
http://www.quora.com/How-can-R-and-Hadoop-be-used-together.
Quora posting, also by Cloudera, or the rhipe R Hadoop package 
https://github.com/saptarshiguha/RHIPE/wiki
Mahout and Lucene/Solr offer some level of text analysis, although I would not 
call these complete text analysis packages.
What I've found are specific algorithms as opposed to a complete package: for 
example LDA for topic discovery -- Mahout and Yahoo Research 
(https://github.com/shravanmn/Yahoo_LDA) have Hadoop based implementations -- in 
the case of Yahoo_LDA the data is stored in HDFS, while the computation is 
essentially MPI based. Whether the algorithm reads data from HDFS store and uses 
another approach other than map reduce is another question.
C

On Apr 25, 2012, at 12:47 PM, Jagat wrote:

> There are Api which you can use , offcourse they are third party.
> 
> -----------
> Sent from Mobile , short and crisp.
> On 25-Apr-2012 8:57 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> 
>> Hadoop itself is the core Map/Reduce and HDFS functionality.  The higher
>> level algorithms like sentiment analysis are often done by others.
>> Cloudera has a video from HadoopWorld 2010 about it
>> 
>> 
>>http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered_by_hadoop/
>>/
>> 
>> And there are likely to be other tools like R that can help you out with
>> it.  I am not really sure if mahout offers sentiment analysis or not, but
>> you might want to look there too http://mahout.apache.org/
>> 
>> --Bobby Evans
>> 
>> 
>> On 4/25/12 7:50 AM, "karanveer.singh@barclays.com" <
>> karanveer.singh@barclays.com> wrote:
>> 
>> Hi,
>> 
>> I wanted to know if there are any existing API's within Hadoop for us to
>> do some text analysis like sentiment analysis, etc. OR are we to rely on
>> tools like R, etc. for this.
>> 
>> 
>> Regards,
>> Karanveer
>> 
>> 
>> 
>> 
>> 
>> This e-mail and any attachments are confidential and intended
>> solely for the addressee and may also be privileged or exempt from
>> disclosure under applicable law. If you are not the addressee, or
>> have received this e-mail in error, please notify the sender
>> immediately, delete it from your system and do not copy, disclose
>> or otherwise act upon any part of this e-mail or its attachments.
>> 
>> Internet communications are not guaranteed to be secure or
>> virus-free.
>> The Barclays Group does not accept responsibility for any loss
>> arising from unauthorised access to, or interference with, any
>> Internet communications by any third party, or from the
>> transmission of any viruses. Replies to this e-mail may be
>> monitored by the Barclays Group for operational or business
>> reasons.
>> 
>> Any opinion or other information in this e-mail or its attachments
>> that does not relate to the business of the Barclays Group is
>> personal to the sender and is not given or endorsed by the Barclays
>> Group.
>> 
>> Barclays Bank PLC. Registered in England and Wales (registered no.
>> 1026167).
>> Registered Office: 1 Churchill Place, London, E14 5HP, United
>> Kingdom.
>> 
>> Barclays Bank PLC is authorised and regulated by the Financial
>> Services Authority.
>> 
>> 

Re: Text Analysis

Posted by Charles Earl <ch...@gmail.com>.
If you've got existing R code, you might want to look at this http://www.quora.com/How-can-R-and-Hadoop-be-used-together.
Quora posting, also by Cloudera, or the rhipe R Hadoop package https://github.com/saptarshiguha/RHIPE/wiki
Mahout and Lucene/Solr offer some level of text analysis, although I would not call these complete text analysis packages.
What I've found are specific algorithms as opposed to a complete package: for example LDA for topic discovery -- Mahout and Yahoo Research (https://github.com/shravanmn/Yahoo_LDA) have Hadoop based implementations -- in the case of Yahoo_LDA the data is stored in HDFS, while the computation is essentially MPI based. Whether the algorithm reads data from HDFS store and uses another approach other than map reduce is another question.
C

On Apr 25, 2012, at 12:47 PM, Jagat wrote:

> There are Api which you can use , offcourse they are third party.
> 
> -----------
> Sent from Mobile , short and crisp.
> On 25-Apr-2012 8:57 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> 
>> Hadoop itself is the core Map/Reduce and HDFS functionality.  The higher
>> level algorithms like sentiment analysis are often done by others.
>> Cloudera has a video from HadoopWorld 2010 about it
>> 
>> 
>> http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered_by_hadoop/
>> 
>> And there are likely to be other tools like R that can help you out with
>> it.  I am not really sure if mahout offers sentiment analysis or not, but
>> you might want to look there too http://mahout.apache.org/
>> 
>> --Bobby Evans
>> 
>> 
>> On 4/25/12 7:50 AM, "karanveer.singh@barclays.com" <
>> karanveer.singh@barclays.com> wrote:
>> 
>> Hi,
>> 
>> I wanted to know if there are any existing API's within Hadoop for us to
>> do some text analysis like sentiment analysis, etc. OR are we to rely on
>> tools like R, etc. for this.
>> 
>> 
>> Regards,
>> Karanveer
>> 
>> 
>> 
>> 
>> 
>> This e-mail and any attachments are confidential and intended
>> solely for the addressee and may also be privileged or exempt from
>> disclosure under applicable law. If you are not the addressee, or
>> have received this e-mail in error, please notify the sender
>> immediately, delete it from your system and do not copy, disclose
>> or otherwise act upon any part of this e-mail or its attachments.
>> 
>> Internet communications are not guaranteed to be secure or
>> virus-free.
>> The Barclays Group does not accept responsibility for any loss
>> arising from unauthorised access to, or interference with, any
>> Internet communications by any third party, or from the
>> transmission of any viruses. Replies to this e-mail may be
>> monitored by the Barclays Group for operational or business
>> reasons.
>> 
>> Any opinion or other information in this e-mail or its attachments
>> that does not relate to the business of the Barclays Group is
>> personal to the sender and is not given or endorsed by the Barclays
>> Group.
>> 
>> Barclays Bank PLC. Registered in England and Wales (registered no.
>> 1026167).
>> Registered Office: 1 Churchill Place, London, E14 5HP, United
>> Kingdom.
>> 
>> Barclays Bank PLC is authorised and regulated by the Financial
>> Services Authority.
>> 
>>