You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@madlib.apache.org by Dmitry Dorofeev <di...@luxmsbi.com> on 2017/05/13 20:13:35 UTC

Sentiment Analysis

Hi all,

We are a BI developers preparing demo for PGDay'17 Russia. Our demo is based on Enron emails dataset and financial data like NYSE stock etc.
Some data is loaded in Postgres and some data is in GreenPlum, so we can use (and already using) GPText and MADlib.

The most exciting thing is sentiment analysis on Enron emails. We want to start with email subjects only, which is similar to twits and we found several OSS projects which can do that.

Can anybody advise on the best way to do sentiment analysis with GPtext & MADlib ? Preferably running inside DB using MADlib?
Are there any articles, github projects covering GPtext/MADlib sentiment analysis you would recommend ?
What about emails body sentiment analysis, is that easily doable or we need to write complex software to do it ?

Thanks

-Dmitry Dorofeev

Re: Sentiment Analysis

Posted by Dmitry Dorofeev <di...@luxmsbi.com>.
We checked (1) Srivatsan work, but it is almost impossible to reproduce.

(2) and (3) looks interesting, thanks.

----- Original Message -----
From: "Frank McQuillan" <fm...@pivotal.io>
To: user@madlib.incubator.apache.org
Sent: Tuesday, May 16, 2017 7:52:24 PM
Subject: Re: Sentiment Analysis

Here are some links on sentiment analysis using MADlib and/or GPText that I
am aware of:

(1)
Deck on topic from Pivotal data scientist
https://www.slideshare.net/SrivatsanRamanujam/a-pipeline-for-distributed-topic-and-sentiment-analysis-of-tweets-on-pivotal-greenplum-database
Pipeline description starts on slide 18

Github repo corresponding to above
https://github.com/pivotalsoftware/tasa


(2)
Blog on text analytics as a service
https://content.pivotal.io/blog/data-science-how-to-text-analytics-as-a-service

Sentiment classifier using PL/Python on PostgreSQL, Greenplum Database, or
Apache HAWQ, related to blog above
https://github.com/crawles/gpdb_sentiment_analysis_twitter_model


(3)
Blog from zData using Greenplum, GPText and Alpine (which uses MADlib)
http://dewoods.com/blog/alpine-sentiment-analysis


I hope these are useful.  Please let us know how your project progresses.

Frank

On Sat, May 13, 2017 at 1:13 PM, Dmitry Dorofeev <di...@luxmsbi.com> wrote:

> Hi all,
>
> We are a BI developers preparing demo for PGDay'17 Russia. Our demo is
> based on Enron emails dataset and financial data like NYSE stock etc.
> Some data is loaded in Postgres and some data is in GreenPlum, so we can
> use (and already using) GPText and MADlib.
>
> The most exciting thing is sentiment analysis on Enron emails. We want to
> start with email subjects only, which is similar to twits and we found
> several OSS projects which can do that.
>
> Can anybody advise on the best way to do sentiment analysis with GPtext &
> MADlib ? Preferably running inside DB using MADlib?
> Are there any articles, github projects covering GPtext/MADlib sentiment
> analysis you would recommend ?
> What about emails body sentiment analysis, is that easily doable or we
> need to write complex software to do it ?
>
> Thanks
>
> -Dmitry Dorofeev
>

Re: Sentiment Analysis

Posted by Frank McQuillan <fm...@pivotal.io>.
Here are some links on sentiment analysis using MADlib and/or GPText that I
am aware of:

(1)
Deck on topic from Pivotal data scientist
https://www.slideshare.net/SrivatsanRamanujam/a-pipeline-for-distributed-topic-and-sentiment-analysis-of-tweets-on-pivotal-greenplum-database
Pipeline description starts on slide 18

Github repo corresponding to above
https://github.com/pivotalsoftware/tasa


(2)
Blog on text analytics as a service
https://content.pivotal.io/blog/data-science-how-to-text-analytics-as-a-service

Sentiment classifier using PL/Python on PostgreSQL, Greenplum Database, or
Apache HAWQ, related to blog above
https://github.com/crawles/gpdb_sentiment_analysis_twitter_model


(3)
Blog from zData using Greenplum, GPText and Alpine (which uses MADlib)
http://dewoods.com/blog/alpine-sentiment-analysis


I hope these are useful.  Please let us know how your project progresses.

Frank

On Sat, May 13, 2017 at 1:13 PM, Dmitry Dorofeev <di...@luxmsbi.com> wrote:

> Hi all,
>
> We are a BI developers preparing demo for PGDay'17 Russia. Our demo is
> based on Enron emails dataset and financial data like NYSE stock etc.
> Some data is loaded in Postgres and some data is in GreenPlum, so we can
> use (and already using) GPText and MADlib.
>
> The most exciting thing is sentiment analysis on Enron emails. We want to
> start with email subjects only, which is similar to twits and we found
> several OSS projects which can do that.
>
> Can anybody advise on the best way to do sentiment analysis with GPtext &
> MADlib ? Preferably running inside DB using MADlib?
> Are there any articles, github projects covering GPtext/MADlib sentiment
> analysis you would recommend ?
> What about emails body sentiment analysis, is that easily doable or we
> need to write complex software to do it ?
>
> Thanks
>
> -Dmitry Dorofeev
>