You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Alex Bohr <al...@gradientx.com> on 2013/03/05 01:33:28 UTC

Best Practices: mapred.job.tracker.handler.count, dfs.namenode.handler.count

Hi,
I'm looking for some feedback on how to decide how many threads to assign
to the Namenode and Jobtracker?

I currently have 24 data nodes (running CDH3) and am finding a lot varying
advice on how to set these properties and change them as the cluster grows.

Some (older) documentation (*
http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/
, http://hadoop.apache.org/docs/r1.0.4/mapred-default.html* ) has it in the
range of the default 10 for a smallish cluster.
And the O'reilly *Hadoop Opertaions *book puts it a good deal higher and
gives a handy precise formula of: natural log of # of nodes X 20 , or: python
-c 'import math ; print int(math.log(24) * 20)'
Which = 63 for 24 nodes.

Does anyone have strong opinions on how to set these variables?  Does
anyone else use the natural log X 20?
Any other factors beyond # of nodes that should be factored?  I'm assuming
memory available on the NameNode/Jobtracker plays a big part, but right now
I have a good amount unused memory so I'm ok going with a higher #.
My jobtracker is occasionally freezing so this is one of the configs I
think might be causing problems.

And second, less important, part of the question, is there any need to put
these properties in their respective config files (mapred-site.xml,
hdfs-site.xml) on any node other than the Namenode?
I've looked but have never found any good documentation discussing which
properties need to be on which machine, and I'd prefer to keep properties
off of a machine if they don't need to be there (so I don't need to restart
anything if the property changes, and keep environments simpler).

Thanks