You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by robert <ro...@austin.rr.com> on 2011/09/11 15:27:19 UTC

Ganglia 3.2 and Hadoop .20.2

I downloaded the latest version of Ganglia and compiled and installed
on my Hadoop cluster. Configured according to the documented
procedures. The latest stable version of Ganglia is 3.2, and I am
using hadoop-0.20.2-cdh31

I just copied the gmond.conf from the distribution to the nodes. It
has what look like default values 239.2.11.71 for mcast_join and port
8649 throughout.

The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
communicating with Hadoop in any reproducible way.  I got reporting on
one node once, got a *different* node reported from telnet localhost
8649 once, but more generally get no reporting of hadoop metrics at
all!  When I bounce the cluster and/or gmond I may or may not get any
difference in behavior. It is frustrating because the behavior seems
to be random and not reproducible.

I wonder if there is a problem with version compatibility?  If there
were release notes indicating a compatibility issue I didn't see them
on the ganglia site.  At this point, I'm tempted to give up on Ganglia
for hadoop metrics and look for alternatives.

Any ideas?






Re: Ganglia 3.2 and Hadoop .20.2

Posted by robert <ro...@austin.rr.com>.
Sorry to follow up my own post but I thought I would give it one more
shot this morning and change  to dfs.servers=239.2.11.71:8649 (the
multicast address). 

Though I am sure I tried that before, it works this time. 
Perhaps the Ganglia system was in some unusual state before.


On 09/11/11 08:27, robert wrote:
> I downloaded the latest version of Ganglia and compiled and installed
> on my Hadoop cluster. Configured according to the documented
> procedures. The latest stable version of Ganglia is 3.2, and I am
> using hadoop-0.20.2-cdh31
>
> I just copied the gmond.conf from the distribution to the nodes. It
> has what look like default values 239.2.11.71 for mcast_join and port
> 8649 throughout.
>
> The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
> communicating with Hadoop in any reproducible way.  I got reporting on
> one node once, got a *different* node reported from telnet localhost
> 8649 once, but more generally get no reporting of hadoop metrics at
> all!  When I bounce the cluster and/or gmond I may or may not get any
> difference in behavior. It is frustrating because the behavior seems
> to be random and not reproducible.
>
> I wonder if there is a problem with version compatibility?  If there
> were release notes indicating a compatibility issue I didn't see them
> on the ganglia site.  At this point, I'm tempted to give up on Ganglia
> for hadoop metrics and look for alternatives.
>
> Any ideas?
>
>
>
>
>
>