You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by robert <ro...@austin.rr.com> on 2011/09/11 15:27:19 UTC
Ganglia 3.2 and Hadoop .20.2
I downloaded the latest version of Ganglia and compiled and installed
on my Hadoop cluster. Configured according to the documented
procedures. The latest stable version of Ganglia is 3.2, and I am
using hadoop-0.20.2-cdh31
I just copied the gmond.conf from the distribution to the nodes. It
has what look like default values 239.2.11.71 for mcast_join and port
8649 throughout.
The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
communicating with Hadoop in any reproducible way. I got reporting on
one node once, got a *different* node reported from telnet localhost
8649 once, but more generally get no reporting of hadoop metrics at
all! When I bounce the cluster and/or gmond I may or may not get any
difference in behavior. It is frustrating because the behavior seems
to be random and not reproducible.
I wonder if there is a problem with version compatibility? If there
were release notes indicating a compatibility issue I didn't see them
on the ganglia site. At this point, I'm tempted to give up on Ganglia
for hadoop metrics and look for alternatives.
Any ideas?
Re: Ganglia 3.2 and Hadoop .20.2
Posted by robert <ro...@austin.rr.com>.
Sorry to follow up my own post but I thought I would give it one more
shot this morning and change to dfs.servers=239.2.11.71:8649 (the
multicast address).
Though I am sure I tried that before, it works this time.
Perhaps the Ganglia system was in some unusual state before.
On 09/11/11 08:27, robert wrote:
> I downloaded the latest version of Ganglia and compiled and installed
> on my Hadoop cluster. Configured according to the documented
> procedures. The latest stable version of Ganglia is 3.2, and I am
> using hadoop-0.20.2-cdh31
>
> I just copied the gmond.conf from the distribution to the nodes. It
> has what look like default values 239.2.11.71 for mcast_join and port
> 8649 throughout.
>
> The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
> communicating with Hadoop in any reproducible way. I got reporting on
> one node once, got a *different* node reported from telnet localhost
> 8649 once, but more generally get no reporting of hadoop metrics at
> all! When I bounce the cluster and/or gmond I may or may not get any
> difference in behavior. It is frustrating because the behavior seems
> to be random and not reproducible.
>
> I wonder if there is a problem with version compatibility? If there
> were release notes indicating a compatibility issue I didn't see them
> on the ganglia site. At this point, I'm tempted to give up on Ganglia
> for hadoop metrics and look for alternatives.
>
> Any ideas?
>
>
>
>
>
>