You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shuja Rehman <sh...@gmail.com> on 2010/11/08 15:34:25 UTC

Configure Ganglia with Hadoop

Hi
I have cluster of 4 machines and want to configure ganglia for monitoring
purpose. I have read the wiki and add the following lines to
hadoop-metrics.properties on each machine.

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=10.10.10.2:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=10.10.10.2:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=10.10.10.2:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
rpc.period=10
rpc.servers=10.10.10.2:8649


where 10.10.10.2 is the machine where i am running gmeated and web front
end. Will  I need to same ip in all machine as i do here or need to give
machine own ip in each file? and is there anything more to do to setup it
with hadoop?



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

If I remember correctly the default config uses multicast so you shoild configure them all with the same
multicast ip. Some guides have you swap that out with UDP so if you did that then you should have the ip you have in there. This is in the 'Listener' and 'Sender' part of the Ganglia config.

There are additional parameters which go in the hdfs config which tell it to send data to Ganglia which while not required will give you Hadoop specific metrics.

I would be happy to send more detail when I arrive at my office if no one else has answered.

On Nov 8, 2010, at 8:36 AM, "Shuja Rehman" <sh...@gmail.com> wrote:

> Hi
> I have cluster of 4 machines and want to configure ganglia for monitoring
> purpose. I have read the wiki and add the following lines to
> hadoop-metrics.properties on each machine.
> 
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=10.10.10.2:8649
> 
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=10.10.10.2:8649
> 
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=10.10.10.2:8649
> 
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=10.10.10.2:8649
> 
> 
> where 10.10.10.2 is the machine where i am running gmeated and web front
> end. Will  I need to same ip in all machine as i do here or need to give
> machine own ip in each file? and is there anything more to do to setup it
> with hadoop?
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

If your network supports multicast then that should work fine.

Ours<http://blog.stlhadoop.org> doesn't so we had to use UDP, this means putting the following in all of the gmond.conf files.

udp_send_channel {
host = ganglianode
port = 8601
}

udp_recv_channel {
port = 8601
family = inet4
}


On Nov 8, 2010, at 12:40 PM, Shuja Rehman wrote:

Hi
I have follow the article, i have one confusion do i need to change
gmond.config file on each node??

host {
 location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
  used to only support having a single channel */
udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
 ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
  an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8649
}


and i need to replace the 239.2.11.71 with that specific machine ip e.g
10.10.10.2 in 1st machine case and 10.10.10.3 for 2nd machine and so on?

On Mon, Nov 8, 2010 at 10:07 PM, Abhinay Mehta <ab...@gmail.com>>wrote:

Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop
cluster, he has written a summary of what we did to get it to work, you
might find it useful:

http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f

Regards,
Abhinay Mehta


On 8 November 2010 15:31, Jonathan Creasy <jo...@announcemedia.com>
wrote:

This is the correct configuration, and there should be nothing more
needed.
I don't think that these configuration changes will take affect on the
fly
so you would need to restart the datanode and namenode processes if I
understand correctly.

When you browse your you will see some more metrics:

dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB
dfs.FSNamesystem.CorruptBlocks
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks
dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops
dfs.datanode.readBlockOp_avg_time
dfs.datanode.readBlockOp_num_ops
dfs.datanode.readMetadataOp_avg_time
dfs.datanode.readMetadataOp_num_ops
dfs.datanode.reads_from_local_client
dfs.datanode.reads_from_remote_client
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.writeBlockOp_avg_time
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client
dfs.namenode.AddBlockOps
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesRenamed
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.SafemodeTime
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops
dfs.namenode.fsImageLoadTime
jvm.metrics.gcCount
jvm.metrics.gcTimeMillis
jvm.metrics.logError
jvm.metrics.logFatal
jvm.metrics.logInfo
jvm.metrics.logWarn
jvm.metrics.maxMemoryM
jvm.metrics.memHeapCommittedM
jvm.metrics.memHeapUsedM
jvm.metrics.memNonHeapCommittedM
jvm.metrics.memNonHeapUsedM
jvm.metrics.threadsBlocked
jvm.metrics.threadsNew
jvm.metrics.threadsRunnable
jvm.metrics.threadsTerminated
jvm.metrics.threadsTimedWaiting
jvm.metrics.threadsWaiting
rpc.metrics.NumOpenConnections
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.abandonBlock_avg_time
rpc.metrics.abandonBlock_num_ops
rpc.metrics.addBlock_avg_time
rpc.metrics.addBlock_num_ops
rpc.metrics.blockReceived_avg_time
rpc.metrics.blockReceived_num_ops
rpc.metrics.blockReport_avg_time
rpc.metrics.blockReport_num_ops
rpc.metrics.callQueueLen
rpc.metrics.complete_avg_time
rpc.metrics.complete_num_ops
rpc.metrics.create_avg_time
rpc.metrics.create_num_ops
rpc.metrics.getEditLogSize_avg_time
rpc.metrics.getEditLogSize_num_ops
rpc.metrics.getProtocolVersion_avg_time
rpc.metrics.getProtocolVersion_num_ops
rpc.metrics.register_avg_time
rpc.metrics.register_num_ops
rpc.metrics.rename_avg_time
rpc.metrics.rename_num_ops
rpc.metrics.renewLease_avg_time
rpc.metrics.renewLease_num_ops
rpc.metrics.rollEditLog_avg_time
rpc.metrics.rollEditLog_num_ops
rpc.metrics.rollFsImage_avg_time
rpc.metrics.rollFsImage_num_ops
rpc.metrics.sendHeartbeat_avg_time
rpc.metrics.sendHeartbeat_num_ops
rpc.metrics.versionRequest_avg_time
rpc.metrics.versionRequest_num_ops

-Jonathan

On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:

Hi
I have cluster of 4 machines and want to configure ganglia for
monitoring
purpose. I have read the wiki and add the following lines to
hadoop-metrics.properties on each machine.

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=10.10.10.2:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=10.10.10.2:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=10.10.10.2:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
rpc.period=10
rpc.servers=10.10.10.2:8649


where 10.10.10.2 is the machine where i am running gmeated and web
front
end. Will  I need to same ip in all machine as i do here or need to
give
machine own ip in each file? and is there anything more to do to setup
it
with hadoop?



--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>






--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

If your network supports multicast then that should work fine.

Ours<http://blog.stlhadoop.org> doesn't so we had to use UDP, this means putting the following in all of the gmond.conf files.

udp_send_channel {
host = ganglianode
port = 8601
}

udp_recv_channel {
port = 8601
family = inet4
}


On Nov 8, 2010, at 12:40 PM, Shuja Rehman wrote:

Hi
I have follow the article, i have one confusion do i need to change
gmond.config file on each node??

host {
 location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
  used to only support having a single channel */
udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
 ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
  an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8649
}


and i need to replace the 239.2.11.71 with that specific machine ip e.g
10.10.10.2 in 1st machine case and 10.10.10.3 for 2nd machine and so on?

On Mon, Nov 8, 2010 at 10:07 PM, Abhinay Mehta <ab...@gmail.com>>wrote:

Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop
cluster, he has written a summary of what we did to get it to work, you
might find it useful:

http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f

Regards,
Abhinay Mehta


On 8 November 2010 15:31, Jonathan Creasy <jo...@announcemedia.com>
wrote:

This is the correct configuration, and there should be nothing more
needed.
I don't think that these configuration changes will take affect on the
fly
so you would need to restart the datanode and namenode processes if I
understand correctly.

When you browse your you will see some more metrics:

dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB
dfs.FSNamesystem.CorruptBlocks
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks
dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops
dfs.datanode.readBlockOp_avg_time
dfs.datanode.readBlockOp_num_ops
dfs.datanode.readMetadataOp_avg_time
dfs.datanode.readMetadataOp_num_ops
dfs.datanode.reads_from_local_client
dfs.datanode.reads_from_remote_client
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.writeBlockOp_avg_time
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client
dfs.namenode.AddBlockOps
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesRenamed
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.SafemodeTime
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops
dfs.namenode.fsImageLoadTime
jvm.metrics.gcCount
jvm.metrics.gcTimeMillis
jvm.metrics.logError
jvm.metrics.logFatal
jvm.metrics.logInfo
jvm.metrics.logWarn
jvm.metrics.maxMemoryM
jvm.metrics.memHeapCommittedM
jvm.metrics.memHeapUsedM
jvm.metrics.memNonHeapCommittedM
jvm.metrics.memNonHeapUsedM
jvm.metrics.threadsBlocked
jvm.metrics.threadsNew
jvm.metrics.threadsRunnable
jvm.metrics.threadsTerminated
jvm.metrics.threadsTimedWaiting
jvm.metrics.threadsWaiting
rpc.metrics.NumOpenConnections
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.abandonBlock_avg_time
rpc.metrics.abandonBlock_num_ops
rpc.metrics.addBlock_avg_time
rpc.metrics.addBlock_num_ops
rpc.metrics.blockReceived_avg_time
rpc.metrics.blockReceived_num_ops
rpc.metrics.blockReport_avg_time
rpc.metrics.blockReport_num_ops
rpc.metrics.callQueueLen
rpc.metrics.complete_avg_time
rpc.metrics.complete_num_ops
rpc.metrics.create_avg_time
rpc.metrics.create_num_ops
rpc.metrics.getEditLogSize_avg_time
rpc.metrics.getEditLogSize_num_ops
rpc.metrics.getProtocolVersion_avg_time
rpc.metrics.getProtocolVersion_num_ops
rpc.metrics.register_avg_time
rpc.metrics.register_num_ops
rpc.metrics.rename_avg_time
rpc.metrics.rename_num_ops
rpc.metrics.renewLease_avg_time
rpc.metrics.renewLease_num_ops
rpc.metrics.rollEditLog_avg_time
rpc.metrics.rollEditLog_num_ops
rpc.metrics.rollFsImage_avg_time
rpc.metrics.rollFsImage_num_ops
rpc.metrics.sendHeartbeat_avg_time
rpc.metrics.sendHeartbeat_num_ops
rpc.metrics.versionRequest_avg_time
rpc.metrics.versionRequest_num_ops

-Jonathan

On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:

Hi
I have cluster of 4 machines and want to configure ganglia for
monitoring
purpose. I have read the wiki and add the following lines to
hadoop-metrics.properties on each machine.

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=10.10.10.2:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=10.10.10.2:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=10.10.10.2:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
rpc.period=10
rpc.servers=10.10.10.2:8649


where 10.10.10.2 is the machine where i am running gmeated and web
front
end. Will  I need to same ip in all machine as i do here or need to
give
machine own ip in each file? and is there anything more to do to setup
it
with hadoop?



--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>






--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Shuja Rehman <sh...@gmail.com>.

Hi
I have follow the article, i have one confusion do i need to change
gmond.config file on each node??

host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}


and i need to replace the 239.2.11.71 with that specific machine ip e.g
10.10.10.2 in 1st machine case and 10.10.10.3 for 2nd machine and so on?

On Mon, Nov 8, 2010 at 10:07 PM, Abhinay Mehta <ab...@gmail.com>wrote:

> Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop
> cluster, he has written a summary of what we did to get it to work, you
> might find it useful:
>
> http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f
>
> Regards,
> Abhinay Mehta
>
>
> On 8 November 2010 15:31, Jonathan Creasy <jon.creasy@announcemedia.com
> >wrote:
>
> > This is the correct configuration, and there should be nothing more
> needed.
> > I don't think that these configuration changes will take affect on the
> fly
> > so you would need to restart the datanode and namenode processes if I
> > understand correctly.
> >
> > When you browse your you will see some more metrics:
> >
> > dfs.FSDirectory.files_deleted
> > dfs.FSNamesystem.BlockCapacity
> > dfs.FSNamesystem.BlocksTotal
> > dfs.FSNamesystem.CapacityRemainingGB
> > dfs.FSNamesystem.CapacityTotalGB
> > dfs.FSNamesystem.CapacityUsedGB
> > dfs.FSNamesystem.CorruptBlocks
> > dfs.FSNamesystem.ExcessBlocks
> > dfs.FSNamesystem.FilesTotal
> > dfs.FSNamesystem.MissingBlocks
> > dfs.FSNamesystem.PendingDeletionBlocks
> > dfs.FSNamesystem.PendingReplicationBlocks
> > dfs.FSNamesystem.ScheduledReplicationBlocks
> > dfs.FSNamesystem.TotalLoad
> > dfs.FSNamesystem.UnderReplicatedBlocks
> > dfs.datanode.blockChecksumOp_avg_time
> > dfs.datanode.blockChecksumOp_num_ops
> > dfs.datanode.blockReports_avg_time
> > dfs.datanode.blockReports_num_ops
> > dfs.datanode.block_verification_failures
> > dfs.datanode.blocks_read
> > dfs.datanode.blocks_removed
> > dfs.datanode.blocks_replicated
> > dfs.datanode.blocks_verified
> > dfs.datanode.blocks_written
> > dfs.datanode.bytes_read
> > dfs.datanode.bytes_written
> > dfs.datanode.copyBlockOp_avg_time
> > dfs.datanode.copyBlockOp_num_ops
> > dfs.datanode.heartBeats_avg_time
> > dfs.datanode.heartBeats_num_ops
> > dfs.datanode.readBlockOp_avg_time
> > dfs.datanode.readBlockOp_num_ops
> > dfs.datanode.readMetadataOp_avg_time
> > dfs.datanode.readMetadataOp_num_ops
> > dfs.datanode.reads_from_local_client
> > dfs.datanode.reads_from_remote_client
> > dfs.datanode.replaceBlockOp_avg_time
> > dfs.datanode.replaceBlockOp_num_ops
> > dfs.datanode.writeBlockOp_avg_time
> > dfs.datanode.writeBlockOp_num_ops
> > dfs.datanode.writes_from_local_client
> > dfs.datanode.writes_from_remote_client
> > dfs.namenode.AddBlockOps
> > dfs.namenode.CreateFileOps
> > dfs.namenode.DeleteFileOps
> > dfs.namenode.FileInfoOps
> > dfs.namenode.FilesAppended
> > dfs.namenode.FilesCreated
> > dfs.namenode.FilesRenamed
> > dfs.namenode.GetBlockLocations
> > dfs.namenode.GetListingOps
> > dfs.namenode.JournalTransactionsBatchedInSync
> > dfs.namenode.SafemodeTime
> > dfs.namenode.Syncs_avg_time
> > dfs.namenode.Syncs_num_ops
> > dfs.namenode.Transactions_avg_time
> > dfs.namenode.Transactions_num_ops
> > dfs.namenode.blockReport_avg_time
> > dfs.namenode.blockReport_num_ops
> > dfs.namenode.fsImageLoadTime
> > jvm.metrics.gcCount
> > jvm.metrics.gcTimeMillis
> > jvm.metrics.logError
> > jvm.metrics.logFatal
> > jvm.metrics.logInfo
> > jvm.metrics.logWarn
> > jvm.metrics.maxMemoryM
> > jvm.metrics.memHeapCommittedM
> > jvm.metrics.memHeapUsedM
> > jvm.metrics.memNonHeapCommittedM
> > jvm.metrics.memNonHeapUsedM
> > jvm.metrics.threadsBlocked
> > jvm.metrics.threadsNew
> > jvm.metrics.threadsRunnable
> > jvm.metrics.threadsTerminated
> > jvm.metrics.threadsTimedWaiting
> > jvm.metrics.threadsWaiting
> > rpc.metrics.NumOpenConnections
> > rpc.metrics.RpcProcessingTime_avg_time
> > rpc.metrics.RpcProcessingTime_num_ops
> > rpc.metrics.RpcQueueTime_avg_time
> > rpc.metrics.RpcQueueTime_num_ops
> > rpc.metrics.abandonBlock_avg_time
> > rpc.metrics.abandonBlock_num_ops
> > rpc.metrics.addBlock_avg_time
> > rpc.metrics.addBlock_num_ops
> > rpc.metrics.blockReceived_avg_time
> > rpc.metrics.blockReceived_num_ops
> > rpc.metrics.blockReport_avg_time
> > rpc.metrics.blockReport_num_ops
> > rpc.metrics.callQueueLen
> > rpc.metrics.complete_avg_time
> > rpc.metrics.complete_num_ops
> > rpc.metrics.create_avg_time
> > rpc.metrics.create_num_ops
> > rpc.metrics.getEditLogSize_avg_time
> > rpc.metrics.getEditLogSize_num_ops
> > rpc.metrics.getProtocolVersion_avg_time
> > rpc.metrics.getProtocolVersion_num_ops
> > rpc.metrics.register_avg_time
> > rpc.metrics.register_num_ops
> > rpc.metrics.rename_avg_time
> > rpc.metrics.rename_num_ops
> > rpc.metrics.renewLease_avg_time
> > rpc.metrics.renewLease_num_ops
> > rpc.metrics.rollEditLog_avg_time
> > rpc.metrics.rollEditLog_num_ops
> > rpc.metrics.rollFsImage_avg_time
> > rpc.metrics.rollFsImage_num_ops
> > rpc.metrics.sendHeartbeat_avg_time
> > rpc.metrics.sendHeartbeat_num_ops
> > rpc.metrics.versionRequest_avg_time
> > rpc.metrics.versionRequest_num_ops
> >
> > -Jonathan
> >
> > On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:
> >
> > > Hi
> > > I have cluster of 4 machines and want to configure ganglia for
> monitoring
> > > purpose. I have read the wiki and add the following lines to
> > > hadoop-metrics.properties on each machine.
> > >
> > > dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > dfs.period=10
> > > dfs.servers=10.10.10.2:8649
> > >
> > > mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > mapred.period=10
> > > mapred.servers=10.10.10.2:8649
> > >
> > > jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > jvm.period=10
> > > jvm.servers=10.10.10.2:8649
> > >
> > > rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > rpc.period=10
> > > rpc.servers=10.10.10.2:8649
> > >
> > >
> > > where 10.10.10.2 is the machine where i am running gmeated and web
> front
> > > end. Will  I need to same ip in all machine as i do here or need to
> give
> > > machine own ip in each file? and is there anything more to do to setup
> it
> > > with hadoop?
> > >
> > >
> > >
> > > --
> > > Regards
> > > Shuja-ur-Rehman Baig
> > > <http://pk.linkedin.com/in/shujamughal>
> >
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Shuja Rehman <sh...@gmail.com>.

Hi
I have follow the article, i have one confusion do i need to change
gmond.config file on each node??

host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}


and i need to replace the 239.2.11.71 with that specific machine ip e.g
10.10.10.2 in 1st machine case and 10.10.10.3 for 2nd machine and so on?

On Mon, Nov 8, 2010 at 10:07 PM, Abhinay Mehta <ab...@gmail.com>wrote:

> Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop
> cluster, he has written a summary of what we did to get it to work, you
> might find it useful:
>
> http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f
>
> Regards,
> Abhinay Mehta
>
>
> On 8 November 2010 15:31, Jonathan Creasy <jon.creasy@announcemedia.com
> >wrote:
>
> > This is the correct configuration, and there should be nothing more
> needed.
> > I don't think that these configuration changes will take affect on the
> fly
> > so you would need to restart the datanode and namenode processes if I
> > understand correctly.
> >
> > When you browse your you will see some more metrics:
> >
> > dfs.FSDirectory.files_deleted
> > dfs.FSNamesystem.BlockCapacity
> > dfs.FSNamesystem.BlocksTotal
> > dfs.FSNamesystem.CapacityRemainingGB
> > dfs.FSNamesystem.CapacityTotalGB
> > dfs.FSNamesystem.CapacityUsedGB
> > dfs.FSNamesystem.CorruptBlocks
> > dfs.FSNamesystem.ExcessBlocks
> > dfs.FSNamesystem.FilesTotal
> > dfs.FSNamesystem.MissingBlocks
> > dfs.FSNamesystem.PendingDeletionBlocks
> > dfs.FSNamesystem.PendingReplicationBlocks
> > dfs.FSNamesystem.ScheduledReplicationBlocks
> > dfs.FSNamesystem.TotalLoad
> > dfs.FSNamesystem.UnderReplicatedBlocks
> > dfs.datanode.blockChecksumOp_avg_time
> > dfs.datanode.blockChecksumOp_num_ops
> > dfs.datanode.blockReports_avg_time
> > dfs.datanode.blockReports_num_ops
> > dfs.datanode.block_verification_failures
> > dfs.datanode.blocks_read
> > dfs.datanode.blocks_removed
> > dfs.datanode.blocks_replicated
> > dfs.datanode.blocks_verified
> > dfs.datanode.blocks_written
> > dfs.datanode.bytes_read
> > dfs.datanode.bytes_written
> > dfs.datanode.copyBlockOp_avg_time
> > dfs.datanode.copyBlockOp_num_ops
> > dfs.datanode.heartBeats_avg_time
> > dfs.datanode.heartBeats_num_ops
> > dfs.datanode.readBlockOp_avg_time
> > dfs.datanode.readBlockOp_num_ops
> > dfs.datanode.readMetadataOp_avg_time
> > dfs.datanode.readMetadataOp_num_ops
> > dfs.datanode.reads_from_local_client
> > dfs.datanode.reads_from_remote_client
> > dfs.datanode.replaceBlockOp_avg_time
> > dfs.datanode.replaceBlockOp_num_ops
> > dfs.datanode.writeBlockOp_avg_time
> > dfs.datanode.writeBlockOp_num_ops
> > dfs.datanode.writes_from_local_client
> > dfs.datanode.writes_from_remote_client
> > dfs.namenode.AddBlockOps
> > dfs.namenode.CreateFileOps
> > dfs.namenode.DeleteFileOps
> > dfs.namenode.FileInfoOps
> > dfs.namenode.FilesAppended
> > dfs.namenode.FilesCreated
> > dfs.namenode.FilesRenamed
> > dfs.namenode.GetBlockLocations
> > dfs.namenode.GetListingOps
> > dfs.namenode.JournalTransactionsBatchedInSync
> > dfs.namenode.SafemodeTime
> > dfs.namenode.Syncs_avg_time
> > dfs.namenode.Syncs_num_ops
> > dfs.namenode.Transactions_avg_time
> > dfs.namenode.Transactions_num_ops
> > dfs.namenode.blockReport_avg_time
> > dfs.namenode.blockReport_num_ops
> > dfs.namenode.fsImageLoadTime
> > jvm.metrics.gcCount
> > jvm.metrics.gcTimeMillis
> > jvm.metrics.logError
> > jvm.metrics.logFatal
> > jvm.metrics.logInfo
> > jvm.metrics.logWarn
> > jvm.metrics.maxMemoryM
> > jvm.metrics.memHeapCommittedM
> > jvm.metrics.memHeapUsedM
> > jvm.metrics.memNonHeapCommittedM
> > jvm.metrics.memNonHeapUsedM
> > jvm.metrics.threadsBlocked
> > jvm.metrics.threadsNew
> > jvm.metrics.threadsRunnable
> > jvm.metrics.threadsTerminated
> > jvm.metrics.threadsTimedWaiting
> > jvm.metrics.threadsWaiting
> > rpc.metrics.NumOpenConnections
> > rpc.metrics.RpcProcessingTime_avg_time
> > rpc.metrics.RpcProcessingTime_num_ops
> > rpc.metrics.RpcQueueTime_avg_time
> > rpc.metrics.RpcQueueTime_num_ops
> > rpc.metrics.abandonBlock_avg_time
> > rpc.metrics.abandonBlock_num_ops
> > rpc.metrics.addBlock_avg_time
> > rpc.metrics.addBlock_num_ops
> > rpc.metrics.blockReceived_avg_time
> > rpc.metrics.blockReceived_num_ops
> > rpc.metrics.blockReport_avg_time
> > rpc.metrics.blockReport_num_ops
> > rpc.metrics.callQueueLen
> > rpc.metrics.complete_avg_time
> > rpc.metrics.complete_num_ops
> > rpc.metrics.create_avg_time
> > rpc.metrics.create_num_ops
> > rpc.metrics.getEditLogSize_avg_time
> > rpc.metrics.getEditLogSize_num_ops
> > rpc.metrics.getProtocolVersion_avg_time
> > rpc.metrics.getProtocolVersion_num_ops
> > rpc.metrics.register_avg_time
> > rpc.metrics.register_num_ops
> > rpc.metrics.rename_avg_time
> > rpc.metrics.rename_num_ops
> > rpc.metrics.renewLease_avg_time
> > rpc.metrics.renewLease_num_ops
> > rpc.metrics.rollEditLog_avg_time
> > rpc.metrics.rollEditLog_num_ops
> > rpc.metrics.rollFsImage_avg_time
> > rpc.metrics.rollFsImage_num_ops
> > rpc.metrics.sendHeartbeat_avg_time
> > rpc.metrics.sendHeartbeat_num_ops
> > rpc.metrics.versionRequest_avg_time
> > rpc.metrics.versionRequest_num_ops
> >
> > -Jonathan
> >
> > On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:
> >
> > > Hi
> > > I have cluster of 4 machines and want to configure ganglia for
> monitoring
> > > purpose. I have read the wiki and add the following lines to
> > > hadoop-metrics.properties on each machine.
> > >
> > > dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > dfs.period=10
> > > dfs.servers=10.10.10.2:8649
> > >
> > > mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > mapred.period=10
> > > mapred.servers=10.10.10.2:8649
> > >
> > > jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > jvm.period=10
> > > jvm.servers=10.10.10.2:8649
> > >
> > > rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > > rpc.period=10
> > > rpc.servers=10.10.10.2:8649
> > >
> > >
> > > where 10.10.10.2 is the machine where i am running gmeated and web
> front
> > > end. Will  I need to same ip in all machine as i do here or need to
> give
> > > machine own ip in each file? and is there anything more to do to setup
> it
> > > with hadoop?
> > >
> > >
> > >
> > > --
> > > Regards
> > > Shuja-ur-Rehman Baig
> > > <http://pk.linkedin.com/in/shujamughal>
> >
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Abhinay Mehta <ab...@gmail.com>.

Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop
cluster, he has written a summary of what we did to get it to work, you
might find it useful:

http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f

Regards,
Abhinay Mehta


On 8 November 2010 15:31, Jonathan Creasy <jo...@announcemedia.com>wrote:

> This is the correct configuration, and there should be nothing more needed.
> I don't think that these configuration changes will take affect on the fly
> so you would need to restart the datanode and namenode processes if I
> understand correctly.
>
> When you browse your you will see some more metrics:
>
> dfs.FSDirectory.files_deleted
> dfs.FSNamesystem.BlockCapacity
> dfs.FSNamesystem.BlocksTotal
> dfs.FSNamesystem.CapacityRemainingGB
> dfs.FSNamesystem.CapacityTotalGB
> dfs.FSNamesystem.CapacityUsedGB
> dfs.FSNamesystem.CorruptBlocks
> dfs.FSNamesystem.ExcessBlocks
> dfs.FSNamesystem.FilesTotal
> dfs.FSNamesystem.MissingBlocks
> dfs.FSNamesystem.PendingDeletionBlocks
> dfs.FSNamesystem.PendingReplicationBlocks
> dfs.FSNamesystem.ScheduledReplicationBlocks
> dfs.FSNamesystem.TotalLoad
> dfs.FSNamesystem.UnderReplicatedBlocks
> dfs.datanode.blockChecksumOp_avg_time
> dfs.datanode.blockChecksumOp_num_ops
> dfs.datanode.blockReports_avg_time
> dfs.datanode.blockReports_num_ops
> dfs.datanode.block_verification_failures
> dfs.datanode.blocks_read
> dfs.datanode.blocks_removed
> dfs.datanode.blocks_replicated
> dfs.datanode.blocks_verified
> dfs.datanode.blocks_written
> dfs.datanode.bytes_read
> dfs.datanode.bytes_written
> dfs.datanode.copyBlockOp_avg_time
> dfs.datanode.copyBlockOp_num_ops
> dfs.datanode.heartBeats_avg_time
> dfs.datanode.heartBeats_num_ops
> dfs.datanode.readBlockOp_avg_time
> dfs.datanode.readBlockOp_num_ops
> dfs.datanode.readMetadataOp_avg_time
> dfs.datanode.readMetadataOp_num_ops
> dfs.datanode.reads_from_local_client
> dfs.datanode.reads_from_remote_client
> dfs.datanode.replaceBlockOp_avg_time
> dfs.datanode.replaceBlockOp_num_ops
> dfs.datanode.writeBlockOp_avg_time
> dfs.datanode.writeBlockOp_num_ops
> dfs.datanode.writes_from_local_client
> dfs.datanode.writes_from_remote_client
> dfs.namenode.AddBlockOps
> dfs.namenode.CreateFileOps
> dfs.namenode.DeleteFileOps
> dfs.namenode.FileInfoOps
> dfs.namenode.FilesAppended
> dfs.namenode.FilesCreated
> dfs.namenode.FilesRenamed
> dfs.namenode.GetBlockLocations
> dfs.namenode.GetListingOps
> dfs.namenode.JournalTransactionsBatchedInSync
> dfs.namenode.SafemodeTime
> dfs.namenode.Syncs_avg_time
> dfs.namenode.Syncs_num_ops
> dfs.namenode.Transactions_avg_time
> dfs.namenode.Transactions_num_ops
> dfs.namenode.blockReport_avg_time
> dfs.namenode.blockReport_num_ops
> dfs.namenode.fsImageLoadTime
> jvm.metrics.gcCount
> jvm.metrics.gcTimeMillis
> jvm.metrics.logError
> jvm.metrics.logFatal
> jvm.metrics.logInfo
> jvm.metrics.logWarn
> jvm.metrics.maxMemoryM
> jvm.metrics.memHeapCommittedM
> jvm.metrics.memHeapUsedM
> jvm.metrics.memNonHeapCommittedM
> jvm.metrics.memNonHeapUsedM
> jvm.metrics.threadsBlocked
> jvm.metrics.threadsNew
> jvm.metrics.threadsRunnable
> jvm.metrics.threadsTerminated
> jvm.metrics.threadsTimedWaiting
> jvm.metrics.threadsWaiting
> rpc.metrics.NumOpenConnections
> rpc.metrics.RpcProcessingTime_avg_time
> rpc.metrics.RpcProcessingTime_num_ops
> rpc.metrics.RpcQueueTime_avg_time
> rpc.metrics.RpcQueueTime_num_ops
> rpc.metrics.abandonBlock_avg_time
> rpc.metrics.abandonBlock_num_ops
> rpc.metrics.addBlock_avg_time
> rpc.metrics.addBlock_num_ops
> rpc.metrics.blockReceived_avg_time
> rpc.metrics.blockReceived_num_ops
> rpc.metrics.blockReport_avg_time
> rpc.metrics.blockReport_num_ops
> rpc.metrics.callQueueLen
> rpc.metrics.complete_avg_time
> rpc.metrics.complete_num_ops
> rpc.metrics.create_avg_time
> rpc.metrics.create_num_ops
> rpc.metrics.getEditLogSize_avg_time
> rpc.metrics.getEditLogSize_num_ops
> rpc.metrics.getProtocolVersion_avg_time
> rpc.metrics.getProtocolVersion_num_ops
> rpc.metrics.register_avg_time
> rpc.metrics.register_num_ops
> rpc.metrics.rename_avg_time
> rpc.metrics.rename_num_ops
> rpc.metrics.renewLease_avg_time
> rpc.metrics.renewLease_num_ops
> rpc.metrics.rollEditLog_avg_time
> rpc.metrics.rollEditLog_num_ops
> rpc.metrics.rollFsImage_avg_time
> rpc.metrics.rollFsImage_num_ops
> rpc.metrics.sendHeartbeat_avg_time
> rpc.metrics.sendHeartbeat_num_ops
> rpc.metrics.versionRequest_avg_time
> rpc.metrics.versionRequest_num_ops
>
> -Jonathan
>
> On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:
>
> > Hi
> > I have cluster of 4 machines and want to configure ganglia for monitoring
> > purpose. I have read the wiki and add the following lines to
> > hadoop-metrics.properties on each machine.
> >
> > dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > dfs.period=10
> > dfs.servers=10.10.10.2:8649
> >
> > mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > mapred.period=10
> > mapred.servers=10.10.10.2:8649
> >
> > jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > jvm.period=10
> > jvm.servers=10.10.10.2:8649
> >
> > rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > rpc.period=10
> > rpc.servers=10.10.10.2:8649
> >
> >
> > where 10.10.10.2 is the machine where i am running gmeated and web front
> > end. Will  I need to same ip in all machine as i do here or need to give
> > machine own ip in each file? and is there anything more to do to setup it
> > with hadoop?
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
>
>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

This is the correct configuration, and there should be nothing more needed. I don't think that these configuration changes will take affect on the fly so you would need to restart the datanode and namenode processes if I understand correctly. 

When you browse your you will see some more metrics:

dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB
dfs.FSNamesystem.CorruptBlocks
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks
dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops
dfs.datanode.readBlockOp_avg_time
dfs.datanode.readBlockOp_num_ops
dfs.datanode.readMetadataOp_avg_time
dfs.datanode.readMetadataOp_num_ops
dfs.datanode.reads_from_local_client
dfs.datanode.reads_from_remote_client
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.writeBlockOp_avg_time
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client
dfs.namenode.AddBlockOps
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesRenamed
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.SafemodeTime
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops
dfs.namenode.fsImageLoadTime
jvm.metrics.gcCount
jvm.metrics.gcTimeMillis
jvm.metrics.logError
jvm.metrics.logFatal
jvm.metrics.logInfo
jvm.metrics.logWarn
jvm.metrics.maxMemoryM
jvm.metrics.memHeapCommittedM
jvm.metrics.memHeapUsedM
jvm.metrics.memNonHeapCommittedM
jvm.metrics.memNonHeapUsedM
jvm.metrics.threadsBlocked
jvm.metrics.threadsNew
jvm.metrics.threadsRunnable
jvm.metrics.threadsTerminated
jvm.metrics.threadsTimedWaiting
jvm.metrics.threadsWaiting
rpc.metrics.NumOpenConnections
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.abandonBlock_avg_time
rpc.metrics.abandonBlock_num_ops
rpc.metrics.addBlock_avg_time
rpc.metrics.addBlock_num_ops
rpc.metrics.blockReceived_avg_time
rpc.metrics.blockReceived_num_ops
rpc.metrics.blockReport_avg_time
rpc.metrics.blockReport_num_ops
rpc.metrics.callQueueLen
rpc.metrics.complete_avg_time
rpc.metrics.complete_num_ops
rpc.metrics.create_avg_time
rpc.metrics.create_num_ops
rpc.metrics.getEditLogSize_avg_time
rpc.metrics.getEditLogSize_num_ops
rpc.metrics.getProtocolVersion_avg_time
rpc.metrics.getProtocolVersion_num_ops
rpc.metrics.register_avg_time
rpc.metrics.register_num_ops
rpc.metrics.rename_avg_time
rpc.metrics.rename_num_ops
rpc.metrics.renewLease_avg_time
rpc.metrics.renewLease_num_ops
rpc.metrics.rollEditLog_avg_time
rpc.metrics.rollEditLog_num_ops
rpc.metrics.rollFsImage_avg_time
rpc.metrics.rollFsImage_num_ops
rpc.metrics.sendHeartbeat_avg_time
rpc.metrics.sendHeartbeat_num_ops
rpc.metrics.versionRequest_avg_time
rpc.metrics.versionRequest_num_ops

-Jonathan

On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:

> Hi
> I have cluster of 4 machines and want to configure ganglia for monitoring
> purpose. I have read the wiki and add the following lines to
> hadoop-metrics.properties on each machine.
> 
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=10.10.10.2:8649
> 
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=10.10.10.2:8649
> 
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=10.10.10.2:8649
> 
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=10.10.10.2:8649
> 
> 
> where 10.10.10.2 is the machine where i am running gmeated and web front
> end. Will  I need to same ip in all machine as i do here or need to give
> machine own ip in each file? and is there anything more to do to setup it
> with hadoop?
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Shuja Rehman <sh...@gmail.com>.

Hi Brian,

(I was not sure which is most suitable list for such questions so i post on
multiple lists)

I use telnet and it shows some xml in output and i believe its working. I am
running 3.1.2 version and installed the packages on ubuntu from this link.

http://packages.ubuntu.com/lucid/ganglia-monitor

As version is 3.1.2 so how to configure Ganglia31Context??

Thanks

On Mon, Nov 8, 2010 at 7:47 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hi Shuja,
>
> (First a note: please do not cross-post onto multiple lists, it's
> considered rude)
>
> Your configuration looks good, I don't think you'll need to do anything
> more, as long as 10.10.10.2:8649 is the correct address.  Ganglia can be
> configured in many ways, so it's hard for me to tell whether or not it's the
> correct decision for your setup.
>
> One way to try is see if you can open that endpoint via telnet:
>
> telnet 10.10.10.2 8649
>
> If the connection is refused, then GangliaContext will not work.
>
> Also, verify that you are running Ganglia 3.0 and not 3.1; 3.1 requires
> Ganglia31Context.
>
> Brian
>
> On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:
>
> > Hi
> > I have cluster of 4 machines and want to configure ganglia for monitoring
> > purpose. I have read the wiki and add the following lines to
> > hadoop-metrics.properties on each machine.
> >
> > dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > dfs.period=10
> > dfs.servers=10.10.10.2:8649
> >
> > mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > mapred.period=10
> > mapred.servers=10.10.10.2:8649
> >
> > jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > jvm.period=10
> > jvm.servers=10.10.10.2:8649
> >
> > rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> > rpc.period=10
> > rpc.servers=10.10.10.2:8649
> >
> >
> > where 10.10.10.2 is the machine where i am running gmeated and web front
> > end. Will  I need to same ip in all machine as i do here or need to give
> > machine own ip in each file? and is there anything more to do to setup it
> > with hadoop?
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
>
>


-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hi Shuja,

(First a note: please do not cross-post onto multiple lists, it's considered rude)

Your configuration looks good, I don't think you'll need to do anything more, as long as 10.10.10.2:8649 is the correct address.  Ganglia can be configured in many ways, so it's hard for me to tell whether or not it's the correct decision for your setup.

One way to try is see if you can open that endpoint via telnet:

telnet 10.10.10.2 8649

If the connection is refused, then GangliaContext will not work.

Also, verify that you are running Ganglia 3.0 and not 3.1; 3.1 requires Ganglia31Context.

Brian

On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:

> Hi
> I have cluster of 4 machines and want to configure ganglia for monitoring
> purpose. I have read the wiki and add the following lines to
> hadoop-metrics.properties on each machine.
> 
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=10.10.10.2:8649
> 
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=10.10.10.2:8649
> 
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=10.10.10.2:8649
> 
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=10.10.10.2:8649
> 
> 
> where 10.10.10.2 is the machine where i am running gmeated and web front
> end. Will  I need to same ip in all machine as i do here or need to give
> machine own ip in each file? and is there anything more to do to setup it
> with hadoop?
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

This is the correct configuration, and there should be nothing more needed. I don't think that these configuration changes will take affect on the fly so you would need to restart the datanode and namenode processes if I understand correctly. 

When you browse your you will see some more metrics:

dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB
dfs.FSNamesystem.CorruptBlocks
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks
dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops
dfs.datanode.readBlockOp_avg_time
dfs.datanode.readBlockOp_num_ops
dfs.datanode.readMetadataOp_avg_time
dfs.datanode.readMetadataOp_num_ops
dfs.datanode.reads_from_local_client
dfs.datanode.reads_from_remote_client
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.writeBlockOp_avg_time
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client
dfs.namenode.AddBlockOps
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesRenamed
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.SafemodeTime
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops
dfs.namenode.fsImageLoadTime
jvm.metrics.gcCount
jvm.metrics.gcTimeMillis
jvm.metrics.logError
jvm.metrics.logFatal
jvm.metrics.logInfo
jvm.metrics.logWarn
jvm.metrics.maxMemoryM
jvm.metrics.memHeapCommittedM
jvm.metrics.memHeapUsedM
jvm.metrics.memNonHeapCommittedM
jvm.metrics.memNonHeapUsedM
jvm.metrics.threadsBlocked
jvm.metrics.threadsNew
jvm.metrics.threadsRunnable
jvm.metrics.threadsTerminated
jvm.metrics.threadsTimedWaiting
jvm.metrics.threadsWaiting
rpc.metrics.NumOpenConnections
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.abandonBlock_avg_time
rpc.metrics.abandonBlock_num_ops
rpc.metrics.addBlock_avg_time
rpc.metrics.addBlock_num_ops
rpc.metrics.blockReceived_avg_time
rpc.metrics.blockReceived_num_ops
rpc.metrics.blockReport_avg_time
rpc.metrics.blockReport_num_ops
rpc.metrics.callQueueLen
rpc.metrics.complete_avg_time
rpc.metrics.complete_num_ops
rpc.metrics.create_avg_time
rpc.metrics.create_num_ops
rpc.metrics.getEditLogSize_avg_time
rpc.metrics.getEditLogSize_num_ops
rpc.metrics.getProtocolVersion_avg_time
rpc.metrics.getProtocolVersion_num_ops
rpc.metrics.register_avg_time
rpc.metrics.register_num_ops
rpc.metrics.rename_avg_time
rpc.metrics.rename_num_ops
rpc.metrics.renewLease_avg_time
rpc.metrics.renewLease_num_ops
rpc.metrics.rollEditLog_avg_time
rpc.metrics.rollEditLog_num_ops
rpc.metrics.rollFsImage_avg_time
rpc.metrics.rollFsImage_num_ops
rpc.metrics.sendHeartbeat_avg_time
rpc.metrics.sendHeartbeat_num_ops
rpc.metrics.versionRequest_avg_time
rpc.metrics.versionRequest_num_ops

-Jonathan

On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote:

> Hi
> I have cluster of 4 machines and want to configure ganglia for monitoring
> purpose. I have read the wiki and add the following lines to
> hadoop-metrics.properties on each machine.
> 
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=10.10.10.2:8649
> 
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=10.10.10.2:8649
> 
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=10.10.10.2:8649
> 
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=10.10.10.2:8649
> 
> 
> where 10.10.10.2 is the machine where i am running gmeated and web front
> end. Will  I need to same ip in all machine as i do here or need to give
> machine own ip in each file? and is there anything more to do to setup it
> with hadoop?
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

Re: Configure Ganglia with Hadoop

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

And in my zeal to be helpful I didn't properly read your question so my response is mostly useless. :)



On Nov 8, 2010, at 8:36 AM, "Shuja Rehman" <sh...@gmail.com> wrote:

> Hi
> I have cluster of 4 machines and want to configure ganglia for monitoring
> purpose. I have read the wiki and add the following lines to
> hadoop-metrics.properties on each machine.
> 
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=10.10.10.2:8649
> 
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=10.10.10.2:8649
> 
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=10.10.10.2:8649
> 
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=10.10.10.2:8649
> 
> 
> where 10.10.10.2 is the machine where i am running gmeated and web front
> end. Will  I need to same ip in all machine as i do here or need to give
> machine own ip in each file? and is there anything more to do to setup it
> with hadoop?
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>