You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Siddharth Tiwari <si...@live.com> on 2012/09/01 11:54:19 UTC

Using mahout for classifying tweets

Hi Users,

I am novice at using Mahout. Can anybody guide me at how can I use Mahout for classifying text into differen classes. In my case its 5 classes and the text is tweets. I mean if there is any tutorial on how to create training model for mahout and how to use it for training and then how we give the dataset for classification ( how we make it compatible for mahout ), then after the classification how to infer the output etc. 
I am sorry if my questions seem dumb, but its only because I have very little knowledge about mahout and I am trying to get grip on it. Thank you so much

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"
 		 	   		  

Re: Using mahout for classifying tweets

Posted by Paritosh Ranjan <pr...@xebia.com>.
Even I am a novice at Mahout Classification, still I will try to give it 
a shot in hope that someone will correct me or improve the answer.

First thing, the text data ( tweets ) would need conversion into 
Vectors. In Mahout terms, this is known as vector encoding. This can be 
done into three ways (one Vector cell
per word, category, or continuous value, Represent Vectors implicitly as 
bags of words, or feature hashing).

Look for ContinuousValueEncoder, AdaptiveWordValueEncoder, 
StaticWordValueEncoder and FeatureVectorEncoder classes or seqdirectory, 
seq2encoded commands.

Then you can use OnlineLogisticRegression, CrossFoldLearner and 
AdaptiveLogisticRegression classes or trainnb, testnb, trainlogistic, 
runlogistic, trainAdaptiveLogistic, validateAdaptiveLogistic, 
runAdaptiveLogistic commands for configuring classification algorithms.

HTH,
Paritosh

On 01-09-2012 15:24, Siddharth Tiwari wrote:
> Hi Users,
>
> I am novice at using Mahout. Can anybody guide me at how can I use Mahout for classifying text into differen classes. In my case its 5 classes and the text is tweets. I mean if there is any tutorial on how to create training model for mahout and how to use it for training and then how we give the dataset for classification ( how we make it compatible for mahout ), then after the classification how to infer the output etc.
> I am sorry if my questions seem dumb, but its only because I have very little knowledge about mahout and I am trying to get grip on it. Thank you so much
>
> *------------------------*
>
> Cheers !!!
>
> Siddharth Tiwari
>
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.”
>
> "Maybe other people will try to limit me but I don't limit myself"
>   		 	   		



Re: Using mahout for classifying tweets

Posted by Salman Mahmood <sa...@influestor.com>.
Buy the book "mahout in action". It gives u an in-depth knowledge of
how classification is done in mahout. I dont know any tutorial links
but you can start with downloading the source code and examining the
example code for classification. But I really recommend the book since
you are a new user.

Sent from my iPhone

On 1 Sep 2012, at 13:03, Paritosh Ranjan <pr...@xebia.com> wrote:

> Even I am a novice at Mahout Classification, still I will try to give it
> a shot in hope that someone will correct me or improve the answer.
>
> First thing, the text data ( tweets ) would need conversion into
> Vectors. In Mahout terms, this is known as vector encoding. This can be
> done into three ways (one Vector cell
> per word, category, or continuous value, Represent Vectors implicitly as
> bags of words, or feature hashing).
>
> Look for ContinuousValueEncoder, AdaptiveWordValueEncoder,
> StaticWordValueEncoder and FeatureVectorEncoder classes or seqdirectory,
> seq2encoded commands.
>
> Then you can use OnlineLogisticRegression, CrossFoldLearner and
> AdaptiveLogisticRegression classes or trainnb, testnb, trainlogistic,
> runlogistic, trainAdaptiveLogistic, validateAdaptiveLogistic,
> runAdaptiveLogistic commands for configuring classification algorithms.
>
> HTH,
> Paritosh
>
> On 01-09-2012 15:24, Siddharth Tiwari wrote:
>> Hi Users,
>>
>> I am novice at using Mahout. Can anybody guide me at how can I use Mahout for classifying text into differen classes. In my case its 5 classes and the text is tweets. I mean if there is any tutorial on how to create training model for mahout and how to use it for training and then how we give the dataset for classification ( how we make it compatible for mahout ), then after the classification how to infer the output etc.
>> I am sorry if my questions seem dumb, but its only because I have very little knowledge about mahout and I am trying to get grip on it. Thank you so much
>>
>> *------------------------*
>>
>> Cheers !!!
>>
>> Siddharth Tiwari
>>
>> Have a refreshing day !!!
>> "Every duty is holy, and devotion to duty is the highest form of worship of God.”
>>
>> "Maybe other people will try to limit me but I don't limit myself"
>>
>
>