You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Nagaraju Bingi <na...@persistent.co.in> on 2012/08/08 19:22:04 UTC

Hadoop cluster/monitoring

Hi,

I'm beginner in Hadoop concepts. I have few basic questions:
1) looking for APIs to retrieve the capacity of the cluster. so that i can write a script to when to add a new slave node to the cluster

             a) No.of Task trackers and  capacity of  each task tracker  to spawn  max No.of Mappers
              b) CPU,RAM and disk capacity of each tracker
              c) how to decide to add a new  slave node to the cluster
 2) what is the API to retrieve metrics like current usage of resources and currently running/spawned Mappers/Reducers

 3) what is the purpose of Hadoop-common?Is it API to interact with hadoop


I referred following URL:
for Hadoop common : http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/
Capacity scheduler : http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/

Regards,
Nagaraju B



DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Hadoop cluster/monitoring

Posted by Harsh J <ha...@cloudera.com>.

Nagaraju,

On Wed, Aug 8, 2012 at 10:52 PM, Nagaraju Bingi
<na...@persistent.co.in> wrote:
> Hi,
>
> I'm beginner in Hadoop concepts. I have few basic questions:
> 1) looking for APIs to retrieve the capacity of the cluster. so that i can write a script to when to add a new slave node to the cluster
>
>              a) No.of Task trackers and  capacity of  each task tracker  to spawn  max No.of Mappers

For this, see: http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapred/ClusterStatus.html

>               b) CPU,RAM and disk capacity of each tracker

Rely on other tools to provide this one. Tools such as Ganglia and
Nagios can report this, for instance.

>               c) how to decide to add a new  slave node to the cluster

This is highly dependent on the workload that is required out of your clusters.

>  2) what is the API to retrieve metrics like current usage of resources and currently running/spawned Mappers/Reducers

See 1.a. for some, and 1.b for some more.

>  3) what is the purpose of Hadoop-common?Is it API to interact with hadoop

Hadoop Common encapsulates the utilities shared by both of the other
sub-projects - MapReduce and HDFS. Among other things, it does provide
a general interaction API for all things 'Hadoop'

-- 
Harsh J