You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Caesar Samsi <ca...@mac.com> on 2015/06/03 23:25:43 UTC

Monitoring dashboard for Hadoop?

Hello,

 

I'm new to Hadoop and successfully built a fully distributed cluster of 3
nodes (1 master, 2 slaves) as a proof of concept. I have some questions
below.

 

Is there a dashboard to monitor the progress of a mapreduce computation? 

1.       I'm looking to ensure the computation gets allocated and uses the
correct number of computation nodes

2.       Monitor computation on the nodes (up/down/in-progress/completed)

3.       If possible direct computation to specific group of nodes
(depending on the computation priority).

 

Similarly for HDFS

1.       Ensure data file gets replicated to the correct number of nodes

2.       If possible prioritize data replication (i.e. replicate data files
that are accessed frequently to nodes that have better hardware, so some
sort of load balancing distribution)

 

Many Thanks, Caesar.

RE: Monitoring dashboard for Hadoop?

Posted by yves callaert <yv...@hotmail.com>.

Hi,
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system, but you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) or Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to favour nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but the namenode will auto correct the problem by creating a new version of the block.
Normally you will have a replication factor of 3, but you can change this, if you want data to be spread across more nodes.

Hope this answers some questions.

With Regards,
Yves
From: caesarsamsi@mac.com
To: user@hadoop.apache.org
Subject: Monitoring dashboard for Hadoop?
Date: Wed, 3 Jun 2015 17:25:43 -0400

Hello, I’m new to Hadoop and successfully built a fully distributed cluster of 3 nodes (1 master, 2 slaves) as a proof of concept. I have some questions below. Is there a dashboard to monitor the progress of a mapreduce computation? 1.       I’m looking to ensure the computation gets allocated and uses the correct number of computation nodes2.       Monitor computation on the nodes (up/down/in-progress/completed)3.       If possible direct computation to specific group of nodes (depending on the computation priority). Similarly for HDFS1.       Ensure data file gets replicated to the correct number of nodes2.       If possible prioritize data replication (i.e. replicate data files that are accessed frequently to nodes that have better hardware, so some sort of load balancing distribution) Many Thanks, Caesar.

RE: Monitoring dashboard for Hadoop?

Posted by yves callaert <yv...@hotmail.com>.

Hi,
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system, but you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) or Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to favour nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but the namenode will auto correct the problem by creating a new version of the block.
Normally you will have a replication factor of 3, but you can change this, if you want data to be spread across more nodes.

Hope this answers some questions.

With Regards,
Yves
From: caesarsamsi@mac.com
To: user@hadoop.apache.org
Subject: Monitoring dashboard for Hadoop?
Date: Wed, 3 Jun 2015 17:25:43 -0400

Hello, I’m new to Hadoop and successfully built a fully distributed cluster of 3 nodes (1 master, 2 slaves) as a proof of concept. I have some questions below. Is there a dashboard to monitor the progress of a mapreduce computation? 1.       I’m looking to ensure the computation gets allocated and uses the correct number of computation nodes2.       Monitor computation on the nodes (up/down/in-progress/completed)3.       If possible direct computation to specific group of nodes (depending on the computation priority). Similarly for HDFS1.       Ensure data file gets replicated to the correct number of nodes2.       If possible prioritize data replication (i.e. replicate data files that are accessed frequently to nodes that have better hardware, so some sort of load balancing distribution) Many Thanks, Caesar.

RE: Monitoring dashboard for Hadoop?

Posted by yves callaert <yv...@hotmail.com>.

Hi,
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system, but you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) or Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to favour nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but the namenode will auto correct the problem by creating a new version of the block.
Normally you will have a replication factor of 3, but you can change this, if you want data to be spread across more nodes.

Hope this answers some questions.

With Regards,
Yves
From: caesarsamsi@mac.com
To: user@hadoop.apache.org
Subject: Monitoring dashboard for Hadoop?
Date: Wed, 3 Jun 2015 17:25:43 -0400

Hello, I’m new to Hadoop and successfully built a fully distributed cluster of 3 nodes (1 master, 2 slaves) as a proof of concept. I have some questions below. Is there a dashboard to monitor the progress of a mapreduce computation? 1.       I’m looking to ensure the computation gets allocated and uses the correct number of computation nodes2.       Monitor computation on the nodes (up/down/in-progress/completed)3.       If possible direct computation to specific group of nodes (depending on the computation priority). Similarly for HDFS1.       Ensure data file gets replicated to the correct number of nodes2.       If possible prioritize data replication (i.e. replicate data files that are accessed frequently to nodes that have better hardware, so some sort of load balancing distribution) Many Thanks, Caesar.

RE: Monitoring dashboard for Hadoop?

Posted by yves callaert <yv...@hotmail.com>.

Hi,
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system, but you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) or Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to favour nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but the namenode will auto correct the problem by creating a new version of the block.
Normally you will have a replication factor of 3, but you can change this, if you want data to be spread across more nodes.

Hope this answers some questions.

With Regards,
Yves
From: caesarsamsi@mac.com
To: user@hadoop.apache.org
Subject: Monitoring dashboard for Hadoop?
Date: Wed, 3 Jun 2015 17:25:43 -0400

Hello, I’m new to Hadoop and successfully built a fully distributed cluster of 3 nodes (1 master, 2 slaves) as a proof of concept. I have some questions below. Is there a dashboard to monitor the progress of a mapreduce computation? 1.       I’m looking to ensure the computation gets allocated and uses the correct number of computation nodes2.       Monitor computation on the nodes (up/down/in-progress/completed)3.       If possible direct computation to specific group of nodes (depending on the computation priority). Similarly for HDFS1.       Ensure data file gets replicated to the correct number of nodes2.       If possible prioritize data replication (i.e. replicate data files that are accessed frequently to nodes that have better hardware, so some sort of load balancing distribution) Many Thanks, Caesar.