You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/28 20:34:44 UTC

Question about Mahout/Hadoop

Hi
I want to know then I run a command like
    mahout trainnb -i .... -o ...

, am I running a mahout code or hadoop?
In other words, which one is dominant?

 
Regards,
Mahmood

Re: Question about Mahout/Hadoop

Posted by Mahmood Naderan <nt...@yahoo.com>.

>From the script I see that mahout finally runs bin/hadoop and hadoop runs the java command. Basically I want to know more about data structures (tree, list, vector, array, ...) or data flow.

Regards,
Mahmood

On Saturday, March 29, 2014 12:54 AM, Chandler Burgess <cb...@icontrolesi.com> wrote:

Mahmood,

What are you trying to get at with your question? Does the answer affect something in your environment? 

If you have the environment variable MAHOUT_LOCAL set, trainnb still runs MapReduce jobs but it runs, basically, Hadoop in memory (from what I can tell anyways). If you don't have that variable set, then the job gets submitted to your Hadoop environment (if the Hadoop environment variables are properly configured).

If you are just getting started and playing around, I would recommend setting MAHOUT_LOCAL, e.g. export MAHOUT_LOCAL=1. I'm a beginner myself but have done a lot of playing around with naïve bayes locally, using datasets up to 400k documents to test with training sets up to 30k documents, and it runs very fast.

-----Original Message-----
From: Andrew Musselman [mailto:andrew.musselman@gmail.com] 
Sent: Friday, March 28, 2014 2:57 PM
To: user@mahout.apache.org
Subject: Re: Question about Mahout/Hadoop

You're running a bash script that lives at $MAHOUT_HOME/bin/mahout.

If you read through that script you can start to follow what goes on when you run the command starting with `mahout`.  See at the bottom of the script where the `exec` commands are; that's where things start to be executed.

On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
>
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
>
>
> Regards,
> Mahmood

RE: Question about Mahout/Hadoop

Posted by Chandler Burgess <cb...@icontrolesi.com>.

Mahmood,

What are you trying to get at with your question? Does the answer affect something in your environment? 

If you have the environment variable MAHOUT_LOCAL set, trainnb still runs MapReduce jobs but it runs, basically, Hadoop in memory (from what I can tell anyways). If you don't have that variable set, then the job gets submitted to your Hadoop environment (if the Hadoop environment variables are properly configured).

If you are just getting started and playing around, I would recommend setting MAHOUT_LOCAL, e.g. export MAHOUT_LOCAL=1. I'm a beginner myself but have done a lot of playing around with naïve bayes locally, using datasets up to 400k documents to test with training sets up to 30k documents, and it runs very fast.

-----Original Message-----
From: Andrew Musselman [mailto:andrew.musselman@gmail.com] 
Sent: Friday, March 28, 2014 2:57 PM
To: user@mahout.apache.org
Subject: Re: Question about Mahout/Hadoop

You're running a bash script that lives at $MAHOUT_HOME/bin/mahout.

If you read through that script you can start to follow what goes on when you run the command starting with `mahout`.  See at the bottom of the script where the `exec` commands are; that's where things start to be executed.

On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
>
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
>
>
> Regards,
> Mahmood

Re: Question about Mahout/Hadoop

Posted by Andrew Musselman <an...@gmail.com>.

You're running a bash script that lives at $MAHOUT_HOME/bin/mahout.

If you read through that script you can start to follow what goes on when
you run the command starting with `mahout`.  See at the bottom of the
script where the `exec` commands are; that's where things start to be
executed.

On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
>
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
>
>
> Regards,
> Mahmood

Re: Question about Mahout/Hadoop

Posted by venkata ramana <ve...@gmail.com>.

Hi Mahamood,

Mahout can be run in two modes one is standalone mode and other in hadoop
cluster mode.

Steps involved in mahout

Step1: Create a seqdir
Step2: Create seq2sparse dir
Step3: Split the test data and train data
Step4: train nb(naive bayes)
Step5: test nb(naive bayes)

Note you can take the help by using the following command

mahout seqdir -h
mahout seqsparsedir -h
mahout split -h
mahout trainnb -h
mahout testnb -h


Please let me know if you need further assitstance.

Thanks,
Venkat
8888812414






On Sat, Mar 29, 2014 at 1:04 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
>
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
>
>
> Regards,
> Mahmood