You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by qiaoresearcher <qi...@gmail.com> on 2014/04/24 19:58:12 UTC
hadoop+python+text mining
I have Hadoop and python installed with nltk. Now I have an large input
file which has three columns:
column 1 | column 2 | column 3
positive id1 some tweet message
negative id2 other tweet message
positive id3 tweet message
negative id4 tweet message
positive id5 tweet message
.... ... ....
I want to use text mining to construct TFIDF vectors from the tweet
messages (also use stop words, stem, etc) and then use some classifier to
classify tweet message as positive or negative. I know how to do it just
using python and nltk. But how to do the same thing on hadoop?
thanks!
Re: hadoop+python+text mining
Posted by Peyman Mohajerian <mo...@gmail.com>.
At the high level I think you have these choices and more:
1) Hadoop Streaming, leverage some of your python could, but not all b/c
you have to deal with map/reduce.
2) Use Mahout.
3) Use a distro of R that works with Hadoop
..
On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher <qi...@gmail.com>wrote:
> I have Hadoop and python installed with nltk. Now I have an large input
> file which has three columns:
> column 1 | column 2 | column 3
> positive id1 some tweet message
> negative id2 other tweet message
> positive id3 tweet message
> negative id4 tweet message
> positive id5 tweet message
> .... ... ....
>
> I want to use text mining to construct TFIDF vectors from the tweet
> messages (also use stop words, stem, etc) and then use some classifier to
> classify tweet message as positive or negative. I know how to do it just
> using python and nltk. But how to do the same thing on hadoop?
>
> thanks!
>
>
>
Re: hadoop+python+text mining
Posted by Peyman Mohajerian <mo...@gmail.com>.
At the high level I think you have these choices and more:
1) Hadoop Streaming, leverage some of your python could, but not all b/c
you have to deal with map/reduce.
2) Use Mahout.
3) Use a distro of R that works with Hadoop
..
On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher <qi...@gmail.com>wrote:
> I have Hadoop and python installed with nltk. Now I have an large input
> file which has three columns:
> column 1 | column 2 | column 3
> positive id1 some tweet message
> negative id2 other tweet message
> positive id3 tweet message
> negative id4 tweet message
> positive id5 tweet message
> .... ... ....
>
> I want to use text mining to construct TFIDF vectors from the tweet
> messages (also use stop words, stem, etc) and then use some classifier to
> classify tweet message as positive or negative. I know how to do it just
> using python and nltk. But how to do the same thing on hadoop?
>
> thanks!
>
>
>
Re: hadoop+python+text mining
Posted by Peyman Mohajerian <mo...@gmail.com>.
At the high level I think you have these choices and more:
1) Hadoop Streaming, leverage some of your python could, but not all b/c
you have to deal with map/reduce.
2) Use Mahout.
3) Use a distro of R that works with Hadoop
..
On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher <qi...@gmail.com>wrote:
> I have Hadoop and python installed with nltk. Now I have an large input
> file which has three columns:
> column 1 | column 2 | column 3
> positive id1 some tweet message
> negative id2 other tweet message
> positive id3 tweet message
> negative id4 tweet message
> positive id5 tweet message
> .... ... ....
>
> I want to use text mining to construct TFIDF vectors from the tweet
> messages (also use stop words, stem, etc) and then use some classifier to
> classify tweet message as positive or negative. I know how to do it just
> using python and nltk. But how to do the same thing on hadoop?
>
> thanks!
>
>
>
Re: hadoop+python+text mining
Posted by Peyman Mohajerian <mo...@gmail.com>.
At the high level I think you have these choices and more:
1) Hadoop Streaming, leverage some of your python could, but not all b/c
you have to deal with map/reduce.
2) Use Mahout.
3) Use a distro of R that works with Hadoop
..
On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher <qi...@gmail.com>wrote:
> I have Hadoop and python installed with nltk. Now I have an large input
> file which has three columns:
> column 1 | column 2 | column 3
> positive id1 some tweet message
> negative id2 other tweet message
> positive id3 tweet message
> negative id4 tweet message
> positive id5 tweet message
> .... ... ....
>
> I want to use text mining to construct TFIDF vectors from the tweet
> messages (also use stop words, stem, etc) and then use some classifier to
> classify tweet message as positive or negative. I know how to do it just
> using python and nltk. But how to do the same thing on hadoop?
>
> thanks!
>
>
>