You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Song Liu <la...@gmail.com> on 2010/04/15 17:01:51 UTC

How to make HOD apply more than one core on each machine?

Dear all, I have a problem here.


     HOD is good, and can manage a large virtual cluster on a huge physical
cluster. but the problem is, it doesnt apply more than one core for each
machine, and I have already recieved complaint from our admin!

     Since Hadoop often starts more than one process on each machine, I
believe this feature is essential for many hadoop programs, and I guess HOD
should already have this feature, but I cant find it.

    Can any one provide any ideas?

Song Liu

Re: How to make HOD apply more than one core on each machine?

Posted by Song Liu <la...@gmail.com>.

Thanks Hemanth!
  As you said, I made a slight change in the file torque.py at the line 41:

  #change the ppn at arguments list
    for index, item in enumerate(argList):
     if item.startswith("nodes"):
         argList[index] = argList[index]+":ppn=4"
        print argList[index]

  and it works fine now.

  But I think it doesnt solve the problem elegantly, and I really think
someone should make a patch on this issue?

  Many Thanks.

Song Liu

On Wed, Apr 21, 2010 at 7:05 PM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Song,
>
> >   I guess you are very close to my point. I mean whether we can find a
> way
> > to set the qsub parameter "ppn"?
>
> From what I could see in the HOD code, it appears you cannot override
> the ppn value with HOD. You could look at
> src/contrib/hod/hodlib/NodePools/torque.py, and specifically the
> method process_qsub_attributes. In this method, the nodes parameter is
> getting set to the value defined by the -n parameter passed to HOD.
> Unless I am missing something, this seems to be the final value that
> can be specified for the nodes parameter to the qsub command.
>
> The method I suggested seems like a workaround to circumvent this
> limitation. In the MAUI documentation, I found it is possible to set
> the specific parameter per Torque job as well. And there is an option
> in HOD to specify such additional parameters, using the key
> resource_manager.attrs.
>
> I know this is not an ideal answer for you. But ATM this is all I can think
> of.
>
> Thanks
> Hemanth
>

Re: How to make HOD apply more than one core on each machine?

Posted by Hemanth Yamijala <yh...@gmail.com>.

Song,

>   I guess you are very close to my point. I mean whether we can find a way
> to set the qsub parameter "ppn"?

>From what I could see in the HOD code, it appears you cannot override
the ppn value with HOD. You could look at
src/contrib/hod/hodlib/NodePools/torque.py, and specifically the
method process_qsub_attributes. In this method, the nodes parameter is
getting set to the value defined by the -n parameter passed to HOD.
Unless I am missing something, this seems to be the final value that
can be specified for the nodes parameter to the qsub command.

The method I suggested seems like a workaround to circumvent this
limitation. In the MAUI documentation, I found it is possible to set
the specific parameter per Torque job as well. And there is an option
in HOD to specify such additional parameters, using the key
resource_manager.attrs.

I know this is not an ideal answer for you. But ATM this is all I can think of.

Thanks
Hemanth

Re: How to make HOD apply more than one core on each machine?

Posted by Song Liu <la...@gmail.com>.

Hi, Thanks  Hemanth!

   I guess you are very close to my point. I mean whether we can find a way
to set the qsub parameter "ppn"?

  ppn controls how many processors are allocated by a specified job. for
example:

  A normal torque qsub will be excuted like qsub -l nodes=3;ppn=4

  However, all the torque job submitted by HOD is like this

  qsub -l nodes=3

  Here are the qstat result

  qstat  -f 179245

   snip ----------
       Resource_List.nodect = 3
       Resource_List.nodes = 3
       Resource_List.walltime = 05:00:00
   snip ----------

  For normal job:

  Resource_List.nodect = 3
  Resource_List.nodes = 3:ppn=4
  Resource_List.walltime = 280:00:00

 I guess we lost ppn parameter when submitting this job. And I believe this
is quite important in most cases we configure jobs.

Song

On Fri, Apr 16, 2010 at 11:50 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Song,
>
> >   I know it is the way to set the capacity of each node, however, I want
> to
> > know, how can we make Torque manager that we will run more than 1 mapred
> > tasks on each machine. Because if we dont do this, torque will assign
> other
> > cores on this machine to other tasks, which may cause a competition for
> > cores.
> >
> >   Do you know how to solve this?
> >
>
> If I understand, what you want is that when a physical node is
> allocated via HOD by the Torque resource manager, you don't want that
> node to be shared by other jobs. Is that correct ?
>
> Looking on the web, I found that schedulers like Maui / Moab that are
> typically used with Torque allow for this. In particular, I thought
> this link:
> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-May/039949.html
> may be particularly useful. It talks about a NODEACCESSPOLICY
> configuration in Maui that is described here:
> http://www.clusterresources.com/products/maui/docs/5.3nodeaccess.shtml.
> Setting this policy to SINGLEJOB seems to solve your problem.
>
> Can you check if this meets your requirement ?
>

Re: How to make HOD apply more than one core on each machine?

Posted by Hemanth Yamijala <yh...@gmail.com>.

Song,

>   I know it is the way to set the capacity of each node, however, I want to
> know, how can we make Torque manager that we will run more than 1 mapred
> tasks on each machine. Because if we dont do this, torque will assign other
> cores on this machine to other tasks, which may cause a competition for
> cores.
>
>   Do you know how to solve this?
>

If I understand, what you want is that when a physical node is
allocated via HOD by the Torque resource manager, you don't want that
node to be shared by other jobs. Is that correct ?

Looking on the web, I found that schedulers like Maui / Moab that are
typically used with Torque allow for this. In particular, I thought
this link: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-May/039949.html
may be particularly useful. It talks about a NODEACCESSPOLICY
configuration in Maui that is described here:
http://www.clusterresources.com/products/maui/docs/5.3nodeaccess.shtml.
Setting this policy to SINGLEJOB seems to solve your problem.

Can you check if this meets your requirement ?

Re: How to make HOD apply more than one core on each machine?

Posted by Song Liu <la...@gmail.com>.

Hi, Thanks for the answer.

   I know it is the way to set the capacity of each node, however, I want to
know, how can we make Torque manager that we will run more than 1 mapred
tasks on each machine. Because if we dont do this, torque will assign other
cores on this machine to other tasks, which may cause a competition for
cores.

   Do you know how to solve this?

   Thanks.

On Thu, Apr 15, 2010 at 7:01 PM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Song,
>
> >     HOD is good, and can manage a large virtual cluster on a huge
> physical
> > cluster. but the problem is, it doesnt apply more than one core for each
> > machine, and I have already recieved complaint from our admin!
> >
>
> I assume what you want is the Map/Reduce cluster that is started by
> HOD to use more than core on each machine. You can configure this in
> the gridservice-mapred section, setting the property server-params.
> For example, if you want to configure 4 map and 2 reduce slots per
> node, you can say:
>
> [gridservice-mapred]
> server-params =
>
> mapred.tasktracker.map.tasks.maximum=4,mapred.tasktracker.reduce.tasks.maximum=2,
>
> That said, since you have not specified any values for these
> parameters, Hadoop's defaults should be picked up, and they default to
> 2 map and 2 reduce slots. Hence, it should already be using more than
> one core. Are you seeing that the JobTracker administration page is
> not showing multiple map and reduce slots per node ?
>

Re: How to make HOD apply more than one core on each machine?

Posted by Hemanth Yamijala <yh...@gmail.com>.

Song,

>     HOD is good, and can manage a large virtual cluster on a huge physical
> cluster. but the problem is, it doesnt apply more than one core for each
> machine, and I have already recieved complaint from our admin!
>

I assume what you want is the Map/Reduce cluster that is started by
HOD to use more than core on each machine. You can configure this in
the gridservice-mapred section, setting the property server-params.
For example, if you want to configure 4 map and 2 reduce slots per
node, you can say:

[gridservice-mapred]
server-params =
mapred.tasktracker.map.tasks.maximum=4,mapred.tasktracker.reduce.tasks.maximum=2,

That said, since you have not specified any values for these
parameters, Hadoop's defaults should be picked up, and they default to
2 map and 2 reduce slots. Hence, it should already be using more than
one core. Are you seeing that the JobTracker administration page is
not showing multiple map and reduce slots per node ?

Re: How to make HOD apply more than one core on each machine?

Posted by Song Liu <la...@gmail.com>.

Here is my configuration file

[hod]
stream                          = True
java-home                       = /gpfs/cluster/cosc/sl9885/jre1.6.0_19/
cluster                         = ALL
cluster-factor                  = 1.8
xrs-port-range                  = 32768-65536
debug                           = 3
allocate-wait-time              = 3600
temp-dir                        = /local/hod

[ringmaster]
register                        = True
stream                          = False
temp-dir                        = /local/sl9885
http-port-range                 = 8000-9000
work-dirs                       = /local/sl9885/1,/local/sl9885/2
xrs-port-range                  = 32768-65536
debug                           = 3

[hodring]
stream                          = False
temp-dir                        = /local/sl9885
register                        = True
java-home                       = /gpfs/cluster/cosc/sl9885/jre1.6.0_19/
http-port-range                 = 8000-9000
xrs-port-range                  = 32768-65536
debug                           = 3

[resource_manager]
queue                           = short
batch-home                      = /cvos/shared/apps/torque/2.3.3/
id                              = torque
env-vars                       =
HOD_PYTHON_HOME=/gpfs/cluster/cosc/sl9885/python/bin/python
[gridservice-mapred]
external                        = False
pkgs                            = /gpfs/cluster/cosc/sl9885/hadoop-0.20.2
tracker_port                    = 8030
info_port                       = 50080

[gridservice-hdfs]
external                        = False
pkgs                            = /gpfs/cluster/cosc/sl9885/hadoop-0.20.2
fs_port                         = 8020
info_port                       = 50070
server-params                   = mapred.child.java.opts=-Xmx1024m


On Thu, Apr 15, 2010 at 4:01 PM, Song Liu <la...@gmail.com> wrote:

> Dear all, I have a problem here.
>
>
>      HOD is good, and can manage a large virtual cluster on a huge physical
> cluster. but the problem is, it doesnt apply more than one core for each
> machine, and I have already recieved complaint from our admin!
>
>      Since Hadoop often starts more than one process on each machine, I
> believe this feature is essential for many hadoop programs, and I guess HOD
> should already have this feature, but I cant find it.
>
>     Can any one provide any ideas?
>
> Song Liu
>