You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2014/01/28 23:00:22 UTC

Force one mapper per machine (not core)?

I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.

Can this be done?

Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Amr Shahin <am...@gmail.com>.

-in theory this should work-
Find the part of hadoop code that calculates the number of cores and patch
it to always return one. [?]


On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <kw...@keithwiley.com> wrote:

> Yeah, it isn't, not even remotely, but thanks.
>
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
>
> > If this cluster is being used exclusively for this goal, you could just
> set the mapred.tasktracker.map.tasks.maximum to 1.
> >
> >
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >
> > Can this be done?
> >
> > Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Amr Shahin <am...@gmail.com>.

-in theory this should work-
Find the part of hadoop code that calculates the number of cores and patch
it to always return one. [?]


On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <kw...@keithwiley.com> wrote:

> Yeah, it isn't, not even remotely, but thanks.
>
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
>
> > If this cluster is being used exclusively for this goal, you could just
> set the mapred.tasktracker.map.tasks.maximum to 1.
> >
> >
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >
> > Can this be done?
> >
> > Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I thought that logic all acted at the level of "slots".  I didn't realize it could make "node" distinctions.  Thanks for the tip.

On Jan 29, 2014, at 05:18 , java8964 wrote:

> Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
> 
> Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
> 
> Yong


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
                                           --  Mark Twain
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I thought that logic all acted at the level of "slots".  I didn't realize it could make "node" distinctions.  Thanks for the tip.

On Jan 29, 2014, at 05:18 , java8964 wrote:

> Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
> 
> Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
> 
> Yong


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
                                           --  Mark Twain
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I thought that logic all acted at the level of "slots".  I didn't realize it could make "node" distinctions.  Thanks for the tip.

On Jan 29, 2014, at 05:18 , java8964 wrote:

> Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
> 
> Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
> 
> Yong


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
                                           --  Mark Twain
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I thought that logic all acted at the level of "slots".  I didn't realize it could make "node" distinctions.  Thanks for the tip.

On Jan 29, 2014, at 05:18 , java8964 wrote:

> Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
> 
> Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
> 
> Yong


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
                                           --  Mark Twain
________________________________________________________________________________

RE: Force one mapper per machine (not core)?

Posted by java8964 <ja...@hotmail.com>.

Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
Yong

> Subject: Re: Force one mapper per machine (not core)?
> From: kwiley@keithwiley.com
> Date: Tue, 28 Jan 2014 15:41:22 -0800
> To: user@hadoop.apache.org
> 
> Yeah, it isn't, not even remotely, but thanks.
> 
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
> 
> > If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> > 
> > 
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> > I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> > 
> > Can this be done?
> > 
> > Thanks.
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
> ________________________________________________________________________________
>

RE: Force one mapper per machine (not core)?

Posted by java8964 <ja...@hotmail.com>.

Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
Yong

> Subject: Re: Force one mapper per machine (not core)?
> From: kwiley@keithwiley.com
> Date: Tue, 28 Jan 2014 15:41:22 -0800
> To: user@hadoop.apache.org
> 
> Yeah, it isn't, not even remotely, but thanks.
> 
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
> 
> > If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> > 
> > 
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> > I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> > 
> > Can this be done?
> > 
> > Thanks.
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
> ________________________________________________________________________________
>

Re: Force one mapper per machine (not core)?

Posted by Amr Shahin <am...@gmail.com>.

-in theory this should work-
Find the part of hadoop code that calculates the number of cores and patch
it to always return one. [?]


On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <kw...@keithwiley.com> wrote:

> Yeah, it isn't, not even remotely, but thanks.
>
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
>
> > If this cluster is being used exclusively for this goal, you could just
> set the mapred.tasktracker.map.tasks.maximum to 1.
> >
> >
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >
> > Can this be done?
> >
> > Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
>
> ________________________________________________________________________________
>
>

RE: Force one mapper per machine (not core)?

Posted by java8964 <ja...@hotmail.com>.

Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
Yong

> Subject: Re: Force one mapper per machine (not core)?
> From: kwiley@keithwiley.com
> Date: Tue, 28 Jan 2014 15:41:22 -0800
> To: user@hadoop.apache.org
> 
> Yeah, it isn't, not even remotely, but thanks.
> 
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
> 
> > If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> > 
> > 
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> > I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> > 
> > Can this be done?
> > 
> > Thanks.
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
> ________________________________________________________________________________
>

Re: Force one mapper per machine (not core)?

Posted by Amr Shahin <am...@gmail.com>.

-in theory this should work-
Find the part of hadoop code that calculates the number of cores and patch
it to always return one. [?]


On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley <kw...@keithwiley.com> wrote:

> Yeah, it isn't, not even remotely, but thanks.
>
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
>
> > If this cluster is being used exclusively for this goal, you could just
> set the mapred.tasktracker.map.tasks.maximum to 1.
> >
> >
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >
> > Can this be done?
> >
> > Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
>
> ________________________________________________________________________________
>
>

RE: Force one mapper per machine (not core)?

Posted by java8964 <ja...@hotmail.com>.

Or you can implement your own InputSplit and InputFormat, which you can control how to send tasks to which node, and how many per node.
Some detail examples you can get from book "Professional Hadoop Solution" Character 4.
Yong

> Subject: Re: Force one mapper per machine (not core)?
> From: kwiley@keithwiley.com
> Date: Tue, 28 Jan 2014 15:41:22 -0800
> To: user@hadoop.apache.org
> 
> Yeah, it isn't, not even remotely, but thanks.
> 
> On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:
> 
> > If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> > 
> > 
> > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> > I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> > 
> > Can this be done?
> > 
> > Thanks.
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
> ________________________________________________________________________________
>

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Yeah, it isn't, not even remotely, but thanks.

On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:

> If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> 
> 
> On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> 
> Can this be done?
> 
> Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Yeah, it isn't, not even remotely, but thanks.

On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:

> If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> 
> 
> On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> 
> Can this be done?
> 
> Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Yeah, it isn't, not even remotely, but thanks.

On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:

> If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> 
> 
> On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> 
> Can this be done?
> 
> Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Yeah, it isn't, not even remotely, but thanks.

On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:

> If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
> 
> 
> On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
> 
> Can this be done?
> 
> Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Bryan Beaudreault <bb...@hubspot.com>.

If this cluster is being used exclusively for this goal, you could just set
the mapred.tasktracker.map.tasks.maximum to 1.


On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:

> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that
> to be
> self-contented is to be vile and ignorant, and that to aspire is better
> than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Bryan Beaudreault <bb...@hubspot.com>.

If this cluster is being used exclusively for this goal, you could just set
the mapred.tasktracker.map.tasks.maximum to 1.


On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:

> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that
> to be
> self-contented is to be vile and ignorant, and that to aspire is better
> than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
>
> Thanks.
>
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
>
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
>
> Thanks.
>
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
>
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
>
> Thanks.
>
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
>
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
>
> Thanks.
>
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
>
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was specifically configured with MR1 or MR2 (is there a distinction between MR2 and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves" like the MR1 clusters I've worked with for years (I interact with the job tracker in a classical way for example).  Can I tell whether it's MR1 or MR2 from the job tracker or namename web UIs?

Thanks.

On Jan 29, 2014, at 00:52 , Harsh J wrote:

> Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> would allow you to do this if you used appropriate memory based
> requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> (depending on the YARN scheduler resource request limits config) you
> can request your job be run with the maximum-most requests that would
> soak up all provided resources (of CPU and Memory) of a node such that
> only one container runs on a host at any given time.
> 
> On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>> 
>> Can this be done?
>> 
>> Thanks.


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was specifically configured with MR1 or MR2 (is there a distinction between MR2 and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves" like the MR1 clusters I've worked with for years (I interact with the job tracker in a classical way for example).  Can I tell whether it's MR1 or MR2 from the job tracker or namename web UIs?

Thanks.

On Jan 29, 2014, at 00:52 , Harsh J wrote:

> Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> would allow you to do this if you used appropriate memory based
> requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> (depending on the YARN scheduler resource request limits config) you
> can request your job be run with the maximum-most requests that would
> soak up all provided resources (of CPU and Memory) of a node such that
> only one container runs on a host at any given time.
> 
> On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>> 
>> Can this be done?
>> 
>> Thanks.


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was specifically configured with MR1 or MR2 (is there a distinction between MR2 and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves" like the MR1 clusters I've worked with for years (I interact with the job tracker in a classical way for example).  Can I tell whether it's MR1 or MR2 from the job tracker or namename web UIs?

Thanks.

On Jan 29, 2014, at 00:52 , Harsh J wrote:

> Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> would allow you to do this if you used appropriate memory based
> requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> (depending on the YARN scheduler resource request limits config) you
> can request your job be run with the maximum-most requests that would
> soak up all provided resources (of CPU and Memory) of a node such that
> only one container runs on a host at any given time.
> 
> On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>> 
>> Can this be done?
>> 
>> Thanks.


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Keith Wiley <kw...@keithwiley.com>.

Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was specifically configured with MR1 or MR2 (is there a distinction between MR2 and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves" like the MR1 clusters I've worked with for years (I interact with the job tracker in a classical way for example).  Can I tell whether it's MR1 or MR2 from the job tracker or namename web UIs?

Thanks.

On Jan 29, 2014, at 00:52 , Harsh J wrote:

> Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> would allow you to do this if you used appropriate memory based
> requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> (depending on the YARN scheduler resource request limits config) you
> can request your job be run with the maximum-most requests that would
> soak up all provided resources (of CPU and Memory) of a node such that
> only one container runs on a host at any given time.
> 
> On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>> 
>> Can this be done?
>> 
>> Thanks.


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
would allow you to do this if you used appropriate memory based
requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
(depending on the YARN scheduler resource request limits config) you
can request your job be run with the maximum-most requests that would
soak up all provided resources (of CPU and Memory) of a node such that
only one container runs on a host at any given time.

On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that to be
> self-contented is to be vile and ignorant, and that to aspire is better than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
> ________________________________________________________________________________
>



-- 
Harsh J

Re: Force one mapper per machine (not core)?

Posted by Bryan Beaudreault <bb...@hubspot.com>.

If this cluster is being used exclusively for this goal, you could just set
the mapred.tasktracker.map.tasks.maximum to 1.


On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:

> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that
> to be
> self-contented is to be vile and ignorant, and that to aspire is better
> than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
would allow you to do this if you used appropriate memory based
requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
(depending on the YARN scheduler resource request limits config) you
can request your job be run with the maximum-most requests that would
soak up all provided resources (of CPU and Memory) of a node such that
only one container runs on a host at any given time.

On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that to be
> self-contented is to be vile and ignorant, and that to aspire is better than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
> ________________________________________________________________________________
>



-- 
Harsh J

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
would allow you to do this if you used appropriate memory based
requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
(depending on the YARN scheduler resource request limits config) you
can request your job be run with the maximum-most requests that would
soak up all provided resources (of CPU and Memory) of a node such that
only one container runs on a host at any given time.

On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that to be
> self-contented is to be vile and ignorant, and that to aspire is better than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
> ________________________________________________________________________________
>



-- 
Harsh J

Re: Force one mapper per machine (not core)?

Posted by Bryan Beaudreault <bb...@hubspot.com>.

If this cluster is being used exclusively for this goal, you could just set
the mapred.tasktracker.map.tasks.maximum to 1.


On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kw...@keithwiley.com> wrote:

> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that
> to be
> self-contented is to be vile and ignorant, and that to aspire is better
> than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
>
> ________________________________________________________________________________
>
>

Re: Force one mapper per machine (not core)?

Posted by Harsh J <ha...@cloudera.com>.

Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
would allow you to do this if you used appropriate memory based
requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
(depending on the YARN scheduler resource request limits config) you
can request your job be run with the maximum-most requests that would
soak up all provided resources (of CPU and Memory) of a node such that
only one container runs on a host at any given time.

On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine.  I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing.  Thus, for even resource utilization, it would be nice to not only assign one mapper per core, but only one mapper per machine.  I realize that if I saturate the cluster none of this really matters, but consider the following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mappers and reducers (40 slots of each).  Say I run this program with just two mappers.  It would run much more efficiently (in essentially half the time) if I could force the two mappers to go to slots on two separate machines instead of running the risk that Hadoop may assign them both to the same machine.
>
> Can this be done?
>
> Thanks.
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "Yet mark his perfect self-contentment, and hence learn his lesson, that to be
> self-contented is to be vile and ignorant, and that to aspire is better than to
> be blindly and impotently happy."
>                                            --  Edwin A. Abbott, Flatland
> ________________________________________________________________________________
>



-- 
Harsh J