You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by sindhu hosamane <si...@gmail.com> on 2014/07/28 13:00:39 UTC

Performance on singlenode and multinode hadoop

Hello ,

i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
mentioned in the thread
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E

Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
powerful , i Setup 2 datanodes on that same machine.

Now when i run jps on that multinode hadoop , i get
Namenode
Datanode
Datanode
Jobtracker
Tasktracker
Secondary Namenode

The above result Shows 2 datanodes are up and running

Also i have a single node on that ubuntu machine as well.
Now when i check Performance on singlenode and multinode , both are almost
same.So now ,
How do i make sure load is being distributed on both datanodes or each
datanode uses different cores of the ubuntu machine.

(Note: i know multiple datanodes on same machine is not that advantageous ,
but assuming my machine is powerful ..i set it up..)

would appreciate any advices on this.

Regards,
Sindhu

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

I am not pretty sure about the answer for this.
I am running Cascalog queries which runs on files which are in MB .



On 31 Jul 2014, at 15:11, Nitin Pawar <ni...@gmail.com> wrote:

> what kind of jobs your tasks will be doing? 
> are they CPU intensive or only memory intensive ? 
> 
> 
> On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:
> Hello ,
> 
> If i am running my experiment on a server with 2 processors (4 cores each ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
> 
> 
> Regards,
> Sindhu
> 
> 
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
> 
> > It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
> 
> 
> 
> 
> -- 
> Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

I am not pretty sure about the answer for this.
I am running Cascalog queries which runs on files which are in MB .



On 31 Jul 2014, at 15:11, Nitin Pawar <ni...@gmail.com> wrote:

> what kind of jobs your tasks will be doing? 
> are they CPU intensive or only memory intensive ? 
> 
> 
> On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:
> Hello ,
> 
> If i am running my experiment on a server with 2 processors (4 cores each ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
> 
> 
> Regards,
> Sindhu
> 
> 
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
> 
> > It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
> 
> 
> 
> 
> -- 
> Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

I am not pretty sure about the answer for this.
I am running Cascalog queries which runs on files which are in MB .



On 31 Jul 2014, at 15:11, Nitin Pawar <ni...@gmail.com> wrote:

> what kind of jobs your tasks will be doing? 
> are they CPU intensive or only memory intensive ? 
> 
> 
> On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:
> Hello ,
> 
> If i am running my experiment on a server with 2 processors (4 cores each ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
> 
> 
> Regards,
> Sindhu
> 
> 
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
> 
> > It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
> 
> 
> 
> 
> -- 
> Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

I am not pretty sure about the answer for this.
I am running Cascalog queries which runs on files which are in MB .



On 31 Jul 2014, at 15:11, Nitin Pawar <ni...@gmail.com> wrote:

> what kind of jobs your tasks will be doing? 
> are they CPU intensive or only memory intensive ? 
> 
> 
> On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:
> Hello ,
> 
> If i am running my experiment on a server with 2 processors (4 cores each ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
> 
> 
> Regards,
> Sindhu
> 
> 
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
> 
> > It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
> 
> 
> 
> 
> -- 
> Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Nitin Pawar <ni...@gmail.com>.

what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?


On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:

> Hello ,
>
> If i am running my experiment on a server with 2 processors (4 cores each
> ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum
>  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
>
>
> Regards,
> Sindhu
>
>
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
>
> > It isn't the DataNode that does the compute spawn/work, but the
> TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com>
> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are
> almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that
> advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
>
>


-- 
Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Nitin Pawar <ni...@gmail.com>.

what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?


On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:

> Hello ,
>
> If i am running my experiment on a server with 2 processors (4 cores each
> ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum
>  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
>
>
> Regards,
> Sindhu
>
>
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
>
> > It isn't the DataNode that does the compute spawn/work, but the
> TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com>
> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are
> almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that
> advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
>
>


-- 
Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Nitin Pawar <ni...@gmail.com>.

what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?


On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:

> Hello ,
>
> If i am running my experiment on a server with 2 processors (4 cores each
> ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum
>  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
>
>
> Regards,
> Sindhu
>
>
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
>
> > It isn't the DataNode that does the compute spawn/work, but the
> TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com>
> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are
> almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that
> advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
>
>


-- 
Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Nitin Pawar <ni...@gmail.com>.

what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?


On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <si...@gmail.com> wrote:

> Hello ,
>
> If i am running my experiment on a server with 2 processors (4 cores each
> ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum
>  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
>
>
> Regards,
> Sindhu
>
>
> On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:
>
> > It isn't the DataNode that does the compute spawn/work, but the
> TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com>
> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are
> almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that
> advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
>
>


-- 
Nitin Pawar

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Hello ,

If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
Your help is very much appreciated.


Regards,
Sindhu


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Thank you Harsh ,
I have a look on this  and get back .


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Hello ,

If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
Your help is very much appreciated.


Regards,
Sindhu


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Thank you Harsh ,
I have a look on this  and get back .


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Hello ,

If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
Your help is very much appreciated.


Regards,
Sindhu


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Hello ,

If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
Your help is very much appreciated.


Regards,
Sindhu


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Thank you Harsh ,
I have a look on this  and get back .


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Sindhu Hosamane <si...@gmail.com>.

Thank you Harsh ,
I have a look on this  and get back .


On 29 Jul 2014, at 18:56, Harsh J <ha...@cloudera.com> wrote:

> It isn't the DataNode that does the compute spawn/work, but the TaskTracker.
> 
> If you wanted to increase MR parallelism on a single machine, you do
> not need two DNs, nor two TTs, just higher slot capacities in your
> TT's mapred-site.xml via properties
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.
> 
> On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
>> Hello ,
>> 
>> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
>> mentioned in the thread
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>> 
>> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
>> powerful , i Setup 2 datanodes on that same machine.
>> 
>> Now when i run jps on that multinode hadoop , i get
>> Namenode
>> Datanode
>> Datanode
>> Jobtracker
>> Tasktracker
>> Secondary Namenode
>> 
>> The above result Shows 2 datanodes are up and running
>> 
>> Also i have a single node on that ubuntu machine as well.
>> Now when i check Performance on singlenode and multinode , both are almost
>> same.So now ,
>> How do i make sure load is being distributed on both datanodes or each
>> datanode uses different cores of the ubuntu machine.
>> 
>> (Note: i know multiple datanodes on same machine is not that advantageous ,
>> but assuming my machine is powerful ..i set it up..)
>> 
>> would appreciate any advices on this.
>> 
>> Regards,
>> Sindhu
> 
> 
> 
> -- 
> Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Harsh J <ha...@cloudera.com>.

It isn't the DataNode that does the compute spawn/work, but the TaskTracker.

If you wanted to increase MR parallelism on a single machine, you do
not need two DNs, nor two TTs, just higher slot capacities in your
TT's mapred-site.xml via properties
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.

On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> Hello ,
>
> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> mentioned in the thread
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>
> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> powerful , i Setup 2 datanodes on that same machine.
>
> Now when i run jps on that multinode hadoop , i get
> Namenode
> Datanode
> Datanode
> Jobtracker
> Tasktracker
> Secondary Namenode
>
> The above result Shows 2 datanodes are up and running
>
> Also i have a single node on that ubuntu machine as well.
> Now when i check Performance on singlenode and multinode , both are almost
> same.So now ,
> How do i make sure load is being distributed on both datanodes or each
> datanode uses different cores of the ubuntu machine.
>
> (Note: i know multiple datanodes on same machine is not that advantageous ,
> but assuming my machine is powerful ..i set it up..)
>
> would appreciate any advices on this.
>
> Regards,
> Sindhu



-- 
Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Harsh J <ha...@cloudera.com>.

It isn't the DataNode that does the compute spawn/work, but the TaskTracker.

If you wanted to increase MR parallelism on a single machine, you do
not need two DNs, nor two TTs, just higher slot capacities in your
TT's mapred-site.xml via properties
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.

On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> Hello ,
>
> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> mentioned in the thread
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>
> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> powerful , i Setup 2 datanodes on that same machine.
>
> Now when i run jps on that multinode hadoop , i get
> Namenode
> Datanode
> Datanode
> Jobtracker
> Tasktracker
> Secondary Namenode
>
> The above result Shows 2 datanodes are up and running
>
> Also i have a single node on that ubuntu machine as well.
> Now when i check Performance on singlenode and multinode , both are almost
> same.So now ,
> How do i make sure load is being distributed on both datanodes or each
> datanode uses different cores of the ubuntu machine.
>
> (Note: i know multiple datanodes on same machine is not that advantageous ,
> but assuming my machine is powerful ..i set it up..)
>
> would appreciate any advices on this.
>
> Regards,
> Sindhu



-- 
Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Harsh J <ha...@cloudera.com>.

It isn't the DataNode that does the compute spawn/work, but the TaskTracker.

If you wanted to increase MR parallelism on a single machine, you do
not need two DNs, nor two TTs, just higher slot capacities in your
TT's mapred-site.xml via properties
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.

On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> Hello ,
>
> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> mentioned in the thread
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>
> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> powerful , i Setup 2 datanodes on that same machine.
>
> Now when i run jps on that multinode hadoop , i get
> Namenode
> Datanode
> Datanode
> Jobtracker
> Tasktracker
> Secondary Namenode
>
> The above result Shows 2 datanodes are up and running
>
> Also i have a single node on that ubuntu machine as well.
> Now when i check Performance on singlenode and multinode , both are almost
> same.So now ,
> How do i make sure load is being distributed on both datanodes or each
> datanode uses different cores of the ubuntu machine.
>
> (Note: i know multiple datanodes on same machine is not that advantageous ,
> but assuming my machine is powerful ..i set it up..)
>
> would appreciate any advices on this.
>
> Regards,
> Sindhu



-- 
Harsh J

Re: Performance on singlenode and multinode hadoop

Posted by Harsh J <ha...@cloudera.com>.

It isn't the DataNode that does the compute spawn/work, but the TaskTracker.

If you wanted to increase MR parallelism on a single machine, you do
not need two DNs, nor two TTs, just higher slot capacities in your
TT's mapred-site.xml via properties
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.

On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <si...@gmail.com> wrote:
> Hello ,
>
> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> mentioned in the thread
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
>
> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> powerful , i Setup 2 datanodes on that same machine.
>
> Now when i run jps on that multinode hadoop , i get
> Namenode
> Datanode
> Datanode
> Jobtracker
> Tasktracker
> Secondary Namenode
>
> The above result Shows 2 datanodes are up and running
>
> Also i have a single node on that ubuntu machine as well.
> Now when i check Performance on singlenode and multinode , both are almost
> same.So now ,
> How do i make sure load is being distributed on both datanodes or each
> datanode uses different cores of the ubuntu machine.
>
> (Note: i know multiple datanodes on same machine is not that advantageous ,
> but assuming my machine is powerful ..i set it up..)
>
> would appreciate any advices on this.
>
> Regards,
> Sindhu



-- 
Harsh J