You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Nagaraju Bingi <na...@persistent.co.in> on 2012/08/08 19:22:04 UTC
Hadoop cluster/monitoring
Hi,
I'm beginner in Hadoop concepts. I have few basic questions:
1) looking for APIs to retrieve the capacity of the cluster. so that i can write a script to when to add a new slave node to the cluster
a) No.of Task trackers and capacity of each task tracker to spawn max No.of Mappers
b) CPU,RAM and disk capacity of each tracker
c) how to decide to add a new slave node to the cluster
2) what is the API to retrieve metrics like current usage of resources and currently running/spawned Mappers/Reducers
3) what is the purpose of Hadoop-common?Is it API to interact with hadoop
I referred following URL:
for Hadoop common : http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/
Capacity scheduler : http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/
Regards,
Nagaraju B
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Hadoop cluster/monitoring
Posted by Harsh J <ha...@cloudera.com>.
Nagaraju,
On Wed, Aug 8, 2012 at 10:52 PM, Nagaraju Bingi
<na...@persistent.co.in> wrote:
> Hi,
>
> I'm beginner in Hadoop concepts. I have few basic questions:
> 1) looking for APIs to retrieve the capacity of the cluster. so that i can write a script to when to add a new slave node to the cluster
>
> a) No.of Task trackers and capacity of each task tracker to spawn max No.of Mappers
For this, see: http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapred/ClusterStatus.html
> b) CPU,RAM and disk capacity of each tracker
Rely on other tools to provide this one. Tools such as Ganglia and
Nagios can report this, for instance.
> c) how to decide to add a new slave node to the cluster
This is highly dependent on the workload that is required out of your clusters.
> 2) what is the API to retrieve metrics like current usage of resources and currently running/spawned Mappers/Reducers
See 1.a. for some, and 1.b for some more.
> 3) what is the purpose of Hadoop-common?Is it API to interact with hadoop
Hadoop Common encapsulates the utilities shared by both of the other
sub-projects - MapReduce and HDFS. Among other things, it does provide
a general interaction API for all things 'Hadoop'
--
Harsh J