You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tamir Kamara <ta...@gmail.com> on 2009/03/17 14:48:41 UTC

Monitoring with Ganglia

Hi,

For a few days I'm trying to make hadoop work with the Ganglia monitoring
software.
I'm using hadoop 0.18.3 with ganglia 3.0.6, I've changed the hadoop-metrics
file as described in the wiki and also used HADOOP-3422 patch.Now, I can
only see system metrics in the ganglia data and nothing about hadoop itself.

I also tried to add a collection group to gmond.conf for metric
mapred.tasktracker.mapTaskSlots, but that caused gmond to stop working
because it couldn't "collect the metric on the platform" which means that it
doesn't recognize the metric.
It should be possible to do this like in http://rcf.unl.edu/ganglia/?c=red.

There're some posts of this issue but I couldn't find any answer or detailed
description of how to monitor hadoop with ganglia.

Does anyone have any experience with this ?

Thanks,
Tamir

Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Thanks Brian !
Works great.


On Thu, Mar 19, 2009 at 3:39 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Tamir,
>
> Instead of
>
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 (for
> Ganglia3.1.x)
>
> use:
>
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
>
> Java is trying to interpret the parenthetical aside as part of the class
> name.
>
> Brian
>
> PS: In distributed systems (or complex systems in general), I'm always
> amazed at all the different ways things can go wrong.
>
>
> On Mar 19, 2009, at 8:35 AM, Tamir Kamara wrote:
>
>  Hi Brian,
>>
>> Do you mean the hadoop-metrics file? It looks like this:
>> # Configuration of the "mapred" context for ganglia
>> # mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext (defalut
>> for
>> Ganglia3.0.x)
>> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 (for
>> Ganglia3.1.x)
>> mapred.period=10
>> mapred.servers=localhost:8649
>>
>> I've only uncommented the last 3 lines. I think that there's a class
>> called
>> GangliaContext31 in
>>
>> /usr/local/hadoop-0.18.4/src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java.
>>
>> thanks,
>> Tamir
>>
>> On Thu, Mar 19, 2009 at 3:25 PM, Brian Bockelman <bbockelm@cse.unl.edu
>> >wrote:
>>
>>  Hey Tamir,
>>>
>>> This is a very strange stack trace:
>>>
>>> java.lang.ClassNotFoundException:
>>> org.apache.hadoop.metrics.ganglia.GangliaContext31 (for Ganglia3.1.x)
>>>      at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>      at java.security.AccessController.doPrivileged(Native Method)
>>>      at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>      at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>      (blah blah blah)
>>>
>>> It looks like it thinks the classname is "GangliaContext31 (for
>>> Ganglia3.1.x)".  Is it possible you accidentally left a comment in your
>>> config?
>>>
>>> Brian
>>>
>>>
>>>
>>> On Mar 19, 2009, at 8:09 AM, Tamir Kamara wrote:
>>>
>>> Hi,
>>>
>>>>
>>>> I attached a zip with the lsof output, jobtracker log and tasktracker
>>>> log
>>>> (I only enabled mapred metrics). You can also see it here:
>>>> http://www.sendspace.com/file/86v5jc
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>> On Thu, Mar 19, 2009 at 2:51 PM, Brian Bockelman <bb...@cse.unl.edu>
>>>> wrote:
>>>> Hey Tamir,
>>>>
>>>> It appears the webserver stripped off your attachment.
>>>>
>>>> Do you have more of a stack trace available?
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:
>>>>
>>>> Hi,
>>>>
>>>> The full lsof | grep java is attached. I see a line with the jar:
>>>> /usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new one
>>>> the
>>>> "ant clean jar" command created.
>>>>
>>>>
>>>> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman <bb...@cse.unl.edu>
>>>> wrote:
>>>>
>>>> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>>>>
>>>> Hi Brian,
>>>>
>>>> I see GangliaContext31.class in the jar and GangliaContext31.java in the
>>>> src
>>>> folder.
>>>>
>>>> By the way, I only used the last version of each patch. Should I apply
>>>> the
>>>> different files per patch from the earliest to the latest ?
>>>>
>>>> Nope.
>>>>
>>>> Can you perform "lsof" on the running process and see if it's perhaps
>>>> using the wrong JAR?
>>>>
>>>> Brian
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Hey Tamir,
>>>>
>>>> Can you see the file GangliaContext31.java in your jar?  In the source
>>>> directory?
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>>>
>>>> Hi,
>>>>
>>>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to
>>>> fix
>>>> the metric names provided by hadoop and it worked. Because I had to
>>>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use
>>>> the
>>>> latest Ganglia (3.1). After changing the metrics file to report with the
>>>> GangliaContext31 class I started getting a ClassNotFoundException. The
>>>> command I used to recompile hadoop was "ant clean jar" and then I moved
>>>> and
>>>> renamed it instead of the original core jar.
>>>>
>>>> Do you what is wrong ?
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>>
>>>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
>>>> wrote:
>>>>
>>>> Make all of your hadoop-metrics properties use the standard IP address
>>>> of
>>>> your master node.
>>>> Then add a straight udp receive block to the gmond.conf of your master
>>>> node.
>>>> Then point your gmetad.conf at your master node.
>>>>
>>>> There are complete details in forthcoming book, and with this in it,
>>>> should
>>>> be available in alpha soon.
>>>>
>>>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>>>> wrote:
>>>>
>>>> I sent my gmond.conf in my previous email... and the address is like
>>>>
>>>> carlos
>>>>
>>>> wrote.
>>>>
>>>> I'll change the hadoop-metrics file and check again.
>>>> However, I would prefer to use a method I'm more familiar with - like
>>>> unicast tcp communication. Do you know what I need to change in ganglia
>>>>
>>>> and
>>>>
>>>> / or hadoop to use it ?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>>
>>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>> I don't know too much about multicast... and I'm using the default
>>>>
>>>> gmond
>>>>
>>>> conf file.
>>>>
>>>>
>>>> The default multicast address seems to be 239.2.11.71, so that's the
>>>> one for your hadoop-metrics.properties.
>>>>
>>>>
>>>> Yup, try that - although I could tell better if I had Tamir's
>>>>
>>>> gmond.conf,
>>>>
>>>> of course.
>>>>
>>>>
>>>>
>>>> Wouldn't using the multicast address mean I'll need to specify a
>>>>
>>>> different
>>>> address for each node so that the data won't get to all nodes running
>>>> gmond
>>>>
>>>>
>>>>
>>>> The design of Ganglia is such that all the data goes at all the nodes
>>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>>>
>>>> non-multicast
>>>>
>>>> TCP
>>>>
>>>> channels.
>>>>
>>>> For reference, our 200 node cluster has about 250KB/s of background
>>>>
>>>> chatter
>>>>
>>>> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>>>>
>>>> small
>>>>
>>>> perturbation on network traffic.
>>>>
>>>> Brian
>>>>
>>>>
>>>> I'm not an expert, either --- I'm using the same multicast address on
>>>>
>>>> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
>>>> traffic from every other node to the multicast address. It's usually a
>>>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>>>> cluster), so the traffic overhead should be negligible.
>>>>
>>>> C
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alpha Chapters of my book on Hadoop are available
>>>> http://www.apress.com/book/view/9781430219422
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> <gout.zip>
>>>>
>>>>
>>>
>>>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Tamir,

Instead of

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 (for  
Ganglia3.1.x)

use:

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

Java is trying to interpret the parenthetical aside as part of the  
class name.

Brian

PS: In distributed systems (or complex systems in general), I'm always  
amazed at all the different ways things can go wrong.

On Mar 19, 2009, at 8:35 AM, Tamir Kamara wrote:

> Hi Brian,
>
> Do you mean the hadoop-metrics file? It looks like this:
> # Configuration of the "mapred" context for ganglia
> # mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext  
> (defalut for
> Ganglia3.0.x)
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 (for
> Ganglia3.1.x)
> mapred.period=10
> mapred.servers=localhost:8649
>
> I've only uncommented the last 3 lines. I think that there's a class  
> called
> GangliaContext31 in
> /usr/local/hadoop-0.18.4/src/core/org/apache/hadoop/metrics/ganglia/ 
> GangliaContext31.java.
>
> thanks,
> Tamir
>
> On Thu, Mar 19, 2009 at 3:25 PM, Brian Bockelman  
> <bb...@cse.unl.edu>wrote:
>
>> Hey Tamir,
>>
>> This is a very strange stack trace:
>>
>> java.lang.ClassNotFoundException:
>> org.apache.hadoop.metrics.ganglia.GangliaContext31 (for Ganglia3.1.x)
>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>       at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 
>> 301)
>>       (blah blah blah)
>>
>> It looks like it thinks the classname is "GangliaContext31 (for
>> Ganglia3.1.x)".  Is it possible you accidentally left a comment in  
>> your
>> config?
>>
>> Brian
>>
>>
>>
>> On Mar 19, 2009, at 8:09 AM, Tamir Kamara wrote:
>>
>> Hi,
>>>
>>> I attached a zip with the lsof output, jobtracker log and  
>>> tasktracker log
>>> (I only enabled mapred metrics). You can also see it here:
>>> http://www.sendspace.com/file/86v5jc
>>>
>>> Thanks,
>>> Tamir
>>>
>>> On Thu, Mar 19, 2009 at 2:51 PM, Brian Bockelman <bbockelm@cse.unl.edu 
>>> >
>>> wrote:
>>> Hey Tamir,
>>>
>>> It appears the webserver stripped off your attachment.
>>>
>>> Do you have more of a stack trace available?
>>>
>>> Brian
>>>
>>>
>>> On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:
>>>
>>> Hi,
>>>
>>> The full lsof | grep java is attached. I see a line with the jar:
>>> /usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the  
>>> new one the
>>> "ant clean jar" command created.
>>>
>>>
>>> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman <bbockelm@cse.unl.edu 
>>> >
>>> wrote:
>>>
>>> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>>>
>>> Hi Brian,
>>>
>>> I see GangliaContext31.class in the jar and GangliaContext31.java  
>>> in the
>>> src
>>> folder.
>>>
>>> By the way, I only used the last version of each patch. Should I  
>>> apply the
>>> different files per patch from the earliest to the latest ?
>>>
>>> Nope.
>>>
>>> Can you perform "lsof" on the running process and see if it's  
>>> perhaps
>>> using the wrong JAR?
>>>
>>> Brian
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Tamir
>>>
>>> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>> wrote:
>>>
>>> Hey Tamir,
>>>
>>> Can you see the file GangliaContext31.java in your jar?  In the  
>>> source
>>> directory?
>>>
>>> Brian
>>>
>>>
>>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>>
>>> Hi,
>>>
>>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422  
>>> patch to fix
>>> the metric names provided by hadoop and it worked. Because I had to
>>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to  
>>> use the
>>> latest Ganglia (3.1). After changing the metrics file to report  
>>> with the
>>> GangliaContext31 class I started getting a ClassNotFoundException.  
>>> The
>>> command I used to recompile hadoop was "ant clean jar" and then I  
>>> moved
>>> and
>>> renamed it instead of the original core jar.
>>>
>>> Do you what is wrong ?
>>>
>>> Thanks,
>>> Tamir
>>>
>>>
>>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop  
>>> <jason.hadoop@gmail.com
>>> wrote:
>>>
>>> Make all of your hadoop-metrics properties use the standard IP  
>>> address of
>>> your master node.
>>> Then add a straight udp receive block to the gmond.conf of your  
>>> master
>>> node.
>>> Then point your gmetad.conf at your master node.
>>>
>>> There are complete details in forthcoming book, and with this in it,
>>> should
>>> be available in alpha soon.
>>>
>>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara  
>>> <ta...@gmail.com>
>>> wrote:
>>>
>>> I sent my gmond.conf in my previous email... and the address is like
>>>
>>> carlos
>>>
>>> wrote.
>>>
>>> I'll change the hadoop-metrics file and check again.
>>> However, I would prefer to use a method I'm more familiar with -  
>>> like
>>> unicast tcp communication. Do you know what I need to change in  
>>> ganglia
>>>
>>> and
>>>
>>> / or hadoop to use it ?
>>>
>>> Thanks.
>>>
>>>
>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>
>>> wrote:
>>>
>>>
>>>
>>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>
>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>>
>>> wrote:
>>>
>>> I don't know too much about multicast... and I'm using the default
>>>
>>> gmond
>>>
>>> conf file.
>>>
>>>
>>> The default multicast address seems to be 239.2.11.71, so that's the
>>> one for your hadoop-metrics.properties.
>>>
>>>
>>> Yup, try that - although I could tell better if I had Tamir's
>>>
>>> gmond.conf,
>>>
>>> of course.
>>>
>>>
>>>
>>> Wouldn't using the multicast address mean I'll need to specify a
>>>
>>> different
>>> address for each node so that the data won't get to all nodes  
>>> running
>>> gmond
>>>
>>>
>>>
>>> The design of Ganglia is such that all the data goes at all the  
>>> nodes
>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>>
>>> non-multicast
>>>
>>> TCP
>>>
>>> channels.
>>>
>>> For reference, our 200 node cluster has about 250KB/s of background
>>>
>>> chatter
>>>
>>> on idle nodes, which is probably Ganglia-related.  It's an  
>>> incredibly
>>>
>>> small
>>>
>>> perturbation on network traffic.
>>>
>>> Brian
>>>
>>>
>>> I'm not an expert, either --- I'm using the same multicast address  
>>> on
>>>
>>> all nodes in my cluster. On each node, tcpdump shows incoming  
>>> Ganglia
>>> traffic from every other node to the multicast address. It's  
>>> usually a
>>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>>> cluster), so the traffic overhead should be negligible.
>>>
>>> C
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Alpha Chapters of my book on Hadoop are available
>>> http://www.apress.com/book/view/9781430219422
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <gout.zip>
>>>
>>
>>


Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi Brian,

Do you mean the hadoop-metrics file? It looks like this:
# Configuration of the "mapred" context for ganglia
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext (defalut for
Ganglia3.0.x)
 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 (for
Ganglia3.1.x)
 mapred.period=10
 mapred.servers=localhost:8649

I've only uncommented the last 3 lines. I think that there's a class called
GangliaContext31 in
/usr/local/hadoop-0.18.4/src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java.

thanks,
Tamir

On Thu, Mar 19, 2009 at 3:25 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Tamir,
>
> This is a very strange stack trace:
>
> java.lang.ClassNotFoundException:
> org.apache.hadoop.metrics.ganglia.GangliaContext31 (for Ganglia3.1.x)
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        (blah blah blah)
>
> It looks like it thinks the classname is "GangliaContext31 (for
> Ganglia3.1.x)".  Is it possible you accidentally left a comment in your
> config?
>
> Brian
>
>
>
> On Mar 19, 2009, at 8:09 AM, Tamir Kamara wrote:
>
>  Hi,
>>
>> I attached a zip with the lsof output, jobtracker log and tasktracker log
>> (I only enabled mapred metrics). You can also see it here:
>> http://www.sendspace.com/file/86v5jc
>>
>> Thanks,
>> Tamir
>>
>> On Thu, Mar 19, 2009 at 2:51 PM, Brian Bockelman <bb...@cse.unl.edu>
>> wrote:
>> Hey Tamir,
>>
>> It appears the webserver stripped off your attachment.
>>
>> Do you have more of a stack trace available?
>>
>> Brian
>>
>>
>> On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:
>>
>> Hi,
>>
>> The full lsof | grep java is attached. I see a line with the jar:
>> /usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new one the
>> "ant clean jar" command created.
>>
>>
>> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman <bb...@cse.unl.edu>
>> wrote:
>>
>> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>>
>> Hi Brian,
>>
>> I see GangliaContext31.class in the jar and GangliaContext31.java in the
>> src
>> folder.
>>
>> By the way, I only used the last version of each patch. Should I apply the
>> different files per patch from the earliest to the latest ?
>>
>> Nope.
>>
>> Can you perform "lsof" on the running process and see if it's perhaps
>> using the wrong JAR?
>>
>> Brian
>>
>>
>>
>>
>> Thanks,
>> Tamir
>>
>> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bbockelm@cse.unl.edu
>> >wrote:
>>
>> Hey Tamir,
>>
>> Can you see the file GangliaContext31.java in your jar?  In the source
>> directory?
>>
>> Brian
>>
>>
>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>
>> Hi,
>>
>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to fix
>> the metric names provided by hadoop and it worked. Because I had to
>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use the
>> latest Ganglia (3.1). After changing the metrics file to report with the
>> GangliaContext31 class I started getting a ClassNotFoundException. The
>> command I used to recompile hadoop was "ant clean jar" and then I moved
>> and
>> renamed it instead of the original core jar.
>>
>> Do you what is wrong ?
>>
>> Thanks,
>> Tamir
>>
>>
>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
>> wrote:
>>
>> Make all of your hadoop-metrics properties use the standard IP address of
>> your master node.
>> Then add a straight udp receive block to the gmond.conf of your master
>> node.
>> Then point your gmetad.conf at your master node.
>>
>> There are complete details in forthcoming book, and with this in it,
>> should
>> be available in alpha soon.
>>
>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>> I sent my gmond.conf in my previous email... and the address is like
>>
>> carlos
>>
>> wrote.
>>
>> I'll change the hadoop-metrics file and check again.
>> However, I would prefer to use a method I'm more familiar with - like
>> unicast tcp communication. Do you know what I need to change in ganglia
>>
>> and
>>
>> / or hadoop to use it ?
>>
>> Thanks.
>>
>>
>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>
>> wrote:
>>
>>
>>
>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>
>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>
>> wrote:
>>
>> I don't know too much about multicast... and I'm using the default
>>
>> gmond
>>
>> conf file.
>>
>>
>> The default multicast address seems to be 239.2.11.71, so that's the
>> one for your hadoop-metrics.properties.
>>
>>
>> Yup, try that - although I could tell better if I had Tamir's
>>
>> gmond.conf,
>>
>> of course.
>>
>>
>>
>> Wouldn't using the multicast address mean I'll need to specify a
>>
>> different
>> address for each node so that the data won't get to all nodes running
>> gmond
>>
>>
>>
>> The design of Ganglia is such that all the data goes at all the nodes
>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>
>> non-multicast
>>
>> TCP
>>
>> channels.
>>
>> For reference, our 200 node cluster has about 250KB/s of background
>>
>> chatter
>>
>> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>>
>> small
>>
>> perturbation on network traffic.
>>
>> Brian
>>
>>
>> I'm not an expert, either --- I'm using the same multicast address on
>>
>> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
>> traffic from every other node to the multicast address. It's usually a
>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>> cluster), so the traffic overhead should be negligible.
>>
>> C
>>
>>
>>
>>
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>>
>>
>>
>>
>>
>>
>> <gout.zip>
>>
>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Tamir,

This is a very strange stack trace:

java.lang.ClassNotFoundException:  
org.apache.hadoop.metrics.ganglia.GangliaContext31 (for Ganglia3.1.x)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	(blah blah blah)

It looks like it thinks the classname is "GangliaContext31 (for  
Ganglia3.1.x)".  Is it possible you accidentally left a comment in  
your config?

Brian


On Mar 19, 2009, at 8:09 AM, Tamir Kamara wrote:

> Hi,
>
> I attached a zip with the lsof output, jobtracker log and  
> tasktracker log (I only enabled mapred metrics). You can also see it  
> here: http://www.sendspace.com/file/86v5jc
>
> Thanks,
> Tamir
>
> On Thu, Mar 19, 2009 at 2:51 PM, Brian Bockelman  
> <bb...@cse.unl.edu> wrote:
> Hey Tamir,
>
> It appears the webserver stripped off your attachment.
>
> Do you have more of a stack trace available?
>
> Brian
>
>
> On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:
>
> Hi,
>
> The full lsof | grep java is attached. I see a line with the jar: / 
> usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new  
> one the "ant clean jar" command created.
>
>
> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman  
> <bb...@cse.unl.edu> wrote:
>
> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>
> Hi Brian,
>
> I see GangliaContext31.class in the jar and GangliaContext31.java in  
> the src
> folder.
>
> By the way, I only used the last version of each patch. Should I  
> apply the
> different files per patch from the earliest to the latest ?
>
> Nope.
>
> Can you perform "lsof" on the running process and see if it's  
> perhaps using the wrong JAR?
>
> Brian
>
>
>
>
> Thanks,
> Tamir
>
> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman  
> <bb...@cse.unl.edu>wrote:
>
> Hey Tamir,
>
> Can you see the file GangliaContext31.java in your jar?  In the source
> directory?
>
> Brian
>
>
> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>
> Hi,
>
> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch  
> to fix
> the metric names provided by hadoop and it worked. Because I had to
> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to  
> use the
> latest Ganglia (3.1). After changing the metrics file to report with  
> the
> GangliaContext31 class I started getting a ClassNotFoundException. The
> command I used to recompile hadoop was "ant clean jar" and then I  
> moved
> and
> renamed it instead of the original core jar.
>
> Do you what is wrong ?
>
> Thanks,
> Tamir
>
>
> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
> wrote:
>
> Make all of your hadoop-metrics properties use the standard IP  
> address of
> your master node.
> Then add a straight udp receive block to the gmond.conf of your master
> node.
> Then point your gmetad.conf at your master node.
>
> There are complete details in forthcoming book, and with this in it,
> should
> be available in alpha soon.
>
> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
> wrote:
>
> I sent my gmond.conf in my previous email... and the address is like
>
> carlos
>
> wrote.
>
> I'll change the hadoop-metrics file and check again.
> However, I would prefer to use a method I'm more familiar with - like
> unicast tcp communication. Do you know what I need to change in  
> ganglia
>
> and
>
> / or hadoop to use it ?
>
> Thanks.
>
>
> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>
> wrote:
>
>
>
> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>
> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>
> wrote:
>
> I don't know too much about multicast... and I'm using the default
>
> gmond
>
> conf file.
>
>
> The default multicast address seems to be 239.2.11.71, so that's the
> one for your hadoop-metrics.properties.
>
>
> Yup, try that - although I could tell better if I had Tamir's
>
> gmond.conf,
>
> of course.
>
>
>
> Wouldn't using the multicast address mean I'll need to specify a
>
> different
> address for each node so that the data won't get to all nodes running
> gmond
>
>
>
> The design of Ganglia is such that all the data goes at all the nodes
> running gmond.  If you don't like it, Ganglia 3.1 supports
>
> non-multicast
>
> TCP
>
> channels.
>
> For reference, our 200 node cluster has about 250KB/s of background
>
> chatter
>
> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>
> small
>
> perturbation on network traffic.
>
> Brian
>
>
> I'm not an expert, either --- I'm using the same multicast address on
>
> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
> traffic from every other node to the multicast address. It's usually a
> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
> cluster), so the traffic overhead should be negligible.
>
> C
>
>
>
>
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>
>
>
>
>
>
>
> <gout.zip>


Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

I attached a zip with the lsof output, jobtracker log and tasktracker log (I
only enabled mapred metrics). You can also see it here:
http://www.sendspace.com/file/86v5jc

Thanks,
Tamir

On Thu, Mar 19, 2009 at 2:51 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Tamir,
>
> It appears the webserver stripped off your attachment.
>
> Do you have more of a stack trace available?
>
> Brian
>
>
> On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:
>
>  Hi,
>>
>> The full lsof | grep java is attached. I see a line with the jar:
>> /usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new one the
>> "ant clean jar" command created.
>>
>>
>> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman <bb...@cse.unl.edu>
>> wrote:
>>
>> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>>
>> Hi Brian,
>>
>> I see GangliaContext31.class in the jar and GangliaContext31.java in the
>> src
>> folder.
>>
>> By the way, I only used the last version of each patch. Should I apply the
>> different files per patch from the earliest to the latest ?
>>
>> Nope.
>>
>> Can you perform "lsof" on the running process and see if it's perhaps
>> using the wrong JAR?
>>
>> Brian
>>
>>
>>
>>
>> Thanks,
>> Tamir
>>
>> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bbockelm@cse.unl.edu
>> >wrote:
>>
>> Hey Tamir,
>>
>> Can you see the file GangliaContext31.java in your jar?  In the source
>> directory?
>>
>> Brian
>>
>>
>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>
>> Hi,
>>
>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to fix
>> the metric names provided by hadoop and it worked. Because I had to
>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use the
>> latest Ganglia (3.1). After changing the metrics file to report with the
>> GangliaContext31 class I started getting a ClassNotFoundException. The
>> command I used to recompile hadoop was "ant clean jar" and then I moved
>> and
>> renamed it instead of the original core jar.
>>
>> Do you what is wrong ?
>>
>> Thanks,
>> Tamir
>>
>>
>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
>> wrote:
>>
>> Make all of your hadoop-metrics properties use the standard IP address of
>> your master node.
>> Then add a straight udp receive block to the gmond.conf of your master
>> node.
>> Then point your gmetad.conf at your master node.
>>
>> There are complete details in forthcoming book, and with this in it,
>> should
>> be available in alpha soon.
>>
>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>> I sent my gmond.conf in my previous email... and the address is like
>>
>> carlos
>>
>> wrote.
>>
>> I'll change the hadoop-metrics file and check again.
>> However, I would prefer to use a method I'm more familiar with - like
>> unicast tcp communication. Do you know what I need to change in ganglia
>>
>> and
>>
>> / or hadoop to use it ?
>>
>> Thanks.
>>
>>
>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>
>> wrote:
>>
>>
>>
>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>
>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>
>> wrote:
>>
>> I don't know too much about multicast... and I'm using the default
>>
>> gmond
>>
>> conf file.
>>
>>
>> The default multicast address seems to be 239.2.11.71, so that's the
>> one for your hadoop-metrics.properties.
>>
>>
>> Yup, try that - although I could tell better if I had Tamir's
>>
>> gmond.conf,
>>
>> of course.
>>
>>
>>
>> Wouldn't using the multicast address mean I'll need to specify a
>>
>> different
>> address for each node so that the data won't get to all nodes running
>> gmond
>>
>>
>>
>> The design of Ganglia is such that all the data goes at all the nodes
>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>
>> non-multicast
>>
>> TCP
>>
>> channels.
>>
>> For reference, our 200 node cluster has about 250KB/s of background
>>
>> chatter
>>
>> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>>
>> small
>>
>> perturbation on network traffic.
>>
>> Brian
>>
>>
>> I'm not an expert, either --- I'm using the same multicast address on
>>
>> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
>> traffic from every other node to the multicast address. It's usually a
>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>> cluster), so the traffic overhead should be negligible.
>>
>> C
>>
>>
>>
>>
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>>
>>
>>
>>
>>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Tamir,

It appears the webserver stripped off your attachment.

Do you have more of a stack trace available?

Brian

On Mar 19, 2009, at 7:25 AM, Tamir Kamara wrote:

> Hi,
>
> The full lsof | grep java is attached. I see a line with the jar: / 
> usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new  
> one the "ant clean jar" command created.
>
>
> On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman  
> <bb...@cse.unl.edu> wrote:
>
> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>
> Hi Brian,
>
> I see GangliaContext31.class in the jar and GangliaContext31.java in  
> the src
> folder.
>
> By the way, I only used the last version of each patch. Should I  
> apply the
> different files per patch from the earliest to the latest ?
>
> Nope.
>
> Can you perform "lsof" on the running process and see if it's  
> perhaps using the wrong JAR?
>
> Brian
>
>
>
>
> Thanks,
> Tamir
>
> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman  
> <bb...@cse.unl.edu>wrote:
>
> Hey Tamir,
>
> Can you see the file GangliaContext31.java in your jar?  In the source
> directory?
>
> Brian
>
>
> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>
> Hi,
>
> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch  
> to fix
> the metric names provided by hadoop and it worked. Because I had to
> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to  
> use the
> latest Ganglia (3.1). After changing the metrics file to report with  
> the
> GangliaContext31 class I started getting a ClassNotFoundException. The
> command I used to recompile hadoop was "ant clean jar" and then I  
> moved
> and
> renamed it instead of the original core jar.
>
> Do you what is wrong ?
>
> Thanks,
> Tamir
>
>
> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
> wrote:
>
> Make all of your hadoop-metrics properties use the standard IP  
> address of
> your master node.
> Then add a straight udp receive block to the gmond.conf of your master
> node.
> Then point your gmetad.conf at your master node.
>
> There are complete details in forthcoming book, and with this in it,
> should
> be available in alpha soon.
>
> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
> wrote:
>
> I sent my gmond.conf in my previous email... and the address is like
>
> carlos
>
> wrote.
>
> I'll change the hadoop-metrics file and check again.
> However, I would prefer to use a method I'm more familiar with - like
> unicast tcp communication. Do you know what I need to change in  
> ganglia
>
> and
>
> / or hadoop to use it ?
>
> Thanks.
>
>
> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>
> wrote:
>
>
>
> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>
> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>
> wrote:
>
> I don't know too much about multicast... and I'm using the default
>
> gmond
>
> conf file.
>
>
> The default multicast address seems to be 239.2.11.71, so that's the
> one for your hadoop-metrics.properties.
>
>
> Yup, try that - although I could tell better if I had Tamir's
>
> gmond.conf,
>
> of course.
>
>
>
> Wouldn't using the multicast address mean I'll need to specify a
>
> different
> address for each node so that the data won't get to all nodes running
> gmond
>
>
>
> The design of Ganglia is such that all the data goes at all the nodes
> running gmond.  If you don't like it, Ganglia 3.1 supports
>
> non-multicast
>
> TCP
>
> channels.
>
> For reference, our 200 node cluster has about 250KB/s of background
>
> chatter
>
> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>
> small
>
> perturbation on network traffic.
>
> Brian
>
>
> I'm not an expert, either --- I'm using the same multicast address on
>
> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
> traffic from every other node to the multicast address. It's usually a
> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
> cluster), so the traffic overhead should be negligible.
>
> C
>
>
>
>
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>
>
>
>
>


Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

The full lsof | grep java is attached. I see a line with the jar:
/usr/local/hadoop-0.18.4/hadoop-0.18.4-dev-core.jar which is the new one the
"ant clean jar" command created.


On Thu, Mar 19, 2009 at 2:00 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

>
> On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:
>
>  Hi Brian,
>>
>> I see GangliaContext31.class in the jar and GangliaContext31.java in the
>> src
>> folder.
>>
>> By the way, I only used the last version of each patch. Should I apply the
>> different files per patch from the earliest to the latest ?
>>
>
> Nope.
>
> Can you perform "lsof" on the running process and see if it's perhaps using
> the wrong JAR?
>
> Brian
>
>
>
>>
>> Thanks,
>> Tamir
>>
>> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bbockelm@cse.unl.edu
>> >wrote:
>>
>>  Hey Tamir,
>>>
>>> Can you see the file GangliaContext31.java in your jar?  In the source
>>> directory?
>>>
>>> Brian
>>>
>>>
>>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>>
>>> Hi,
>>>
>>>>
>>>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to
>>>> fix
>>>> the metric names provided by hadoop and it worked. Because I had to
>>>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use
>>>> the
>>>> latest Ganglia (3.1). After changing the metrics file to report with the
>>>> GangliaContext31 class I started getting a ClassNotFoundException. The
>>>> command I used to recompile hadoop was "ant clean jar" and then I moved
>>>> and
>>>> renamed it instead of the original core jar.
>>>>
>>>> Do you what is wrong ?
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>>
>>>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Make all of your hadoop-metrics properties use the standard IP address
>>>> of
>>>>
>>>>> your master node.
>>>>> Then add a straight udp receive block to the gmond.conf of your master
>>>>> node.
>>>>> Then point your gmetad.conf at your master node.
>>>>>
>>>>> There are complete details in forthcoming book, and with this in it,
>>>>> should
>>>>> be available in alpha soon.
>>>>>
>>>>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I sent my gmond.conf in my previous email... and the address is like
>>>>>
>>>>>>
>>>>>>  carlos
>>>>>
>>>>>  wrote.
>>>>>>
>>>>>> I'll change the hadoop-metrics file and check again.
>>>>>> However, I would prefer to use a method I'm more familiar with - like
>>>>>> unicast tcp communication. Do you know what I need to change in
>>>>>> ganglia
>>>>>>
>>>>>>  and
>>>>>
>>>>>  / or hadoop to use it ?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <
>>>>>> bbockelm@cse.unl.edu
>>>>>>
>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>  On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>>>>>
>>>>>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I don't know too much about multicast... and I'm using the default
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  gmond
>>>>>>>>
>>>>>>>
>>>>>>  conf file.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  The default multicast address seems to be 239.2.11.71, so that's
>>>>>>>> the
>>>>>>>> one for your hadoop-metrics.properties.
>>>>>>>>
>>>>>>>>
>>>>>>>>  Yup, try that - although I could tell better if I had Tamir's
>>>>>>>
>>>>>>>  gmond.conf,
>>>>>>
>>>>>
>>>>>  of course.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Wouldn't using the multicast address mean I'll need to specify a
>>>>>>>>
>>>>>>>>  different
>>>>>>>>> address for each node so that the data won't get to all nodes
>>>>>>>>> running
>>>>>>>>> gmond
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The design of Ganglia is such that all the data goes at all the
>>>>>>>> nodes
>>>>>>>>
>>>>>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>>>>>>
>>>>>>>  non-multicast
>>>>>>
>>>>>
>>>>>  TCP
>>>>>>
>>>>>>  channels.
>>>>>>>
>>>>>>> For reference, our 200 node cluster has about 250KB/s of background
>>>>>>>
>>>>>>>  chatter
>>>>>>
>>>>>>  on idle nodes, which is probably Ganglia-related.  It's an incredibly
>>>>>>>
>>>>>>>  small
>>>>>>
>>>>>>  perturbation on network traffic.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>>
>>>>>>> I'm not an expert, either --- I'm using the same multicast address on
>>>>>>>
>>>>>>>  all nodes in my cluster. On each node, tcpdump shows incoming
>>>>>>>> Ganglia
>>>>>>>> traffic from every other node to the multicast address. It's usually
>>>>>>>> a
>>>>>>>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>>>>>>>> cluster), so the traffic overhead should be negligible.
>>>>>>>>
>>>>>>>> C
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Alpha Chapters of my book on Hadoop are available
>>>>> http://www.apress.com/book/view/9781430219422
>>>>>
>>>>>
>>>>>
>>>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Mar 19, 2009, at 6:56 AM, Tamir Kamara wrote:

> Hi Brian,
>
> I see GangliaContext31.class in the jar and GangliaContext31.java in  
> the src
> folder.
>
> By the way, I only used the last version of each patch. Should I  
> apply the
> different files per patch from the earliest to the latest ?

Nope.

Can you perform "lsof" on the running process and see if it's perhaps  
using the wrong JAR?

Brian

>
>
> Thanks,
> Tamir
>
> On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman  
> <bb...@cse.unl.edu>wrote:
>
>> Hey Tamir,
>>
>> Can you see the file GangliaContext31.java in your jar?  In the  
>> source
>> directory?
>>
>> Brian
>>
>>
>> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>>
>> Hi,
>>>
>>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422  
>>> patch to fix
>>> the metric names provided by hadoop and it worked. Because I had to
>>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to  
>>> use the
>>> latest Ganglia (3.1). After changing the metrics file to report  
>>> with the
>>> GangliaContext31 class I started getting a ClassNotFoundException.  
>>> The
>>> command I used to recompile hadoop was "ant clean jar" and then I  
>>> moved
>>> and
>>> renamed it instead of the original core jar.
>>>
>>> Do you what is wrong ?
>>>
>>> Thanks,
>>> Tamir
>>>
>>>
>>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop  
>>> <jason.hadoop@gmail.com
>>>> wrote:
>>>
>>> Make all of your hadoop-metrics properties use the standard IP  
>>> address of
>>>> your master node.
>>>> Then add a straight udp receive block to the gmond.conf of your  
>>>> master
>>>> node.
>>>> Then point your gmetad.conf at your master node.
>>>>
>>>> There are complete details in forthcoming book, and with this in  
>>>> it,
>>>> should
>>>> be available in alpha soon.
>>>>
>>>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <tamirkamara@gmail.com 
>>>> >
>>>> wrote:
>>>>
>>>> I sent my gmond.conf in my previous email... and the address is  
>>>> like
>>>>>
>>>> carlos
>>>>
>>>>> wrote.
>>>>>
>>>>> I'll change the hadoop-metrics file and check again.
>>>>> However, I would prefer to use a method I'm more familiar with -  
>>>>> like
>>>>> unicast tcp communication. Do you know what I need to change in  
>>>>> ganglia
>>>>>
>>>> and
>>>>
>>>>> / or hadoop to use it ?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>>
>>>>>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>>>>
>>>>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <tamirkamara@gmail.com 
>>>>>> >
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I don't know too much about multicast... and I'm using the  
>>>>>>> default
>>>>>>>>
>>>>>>> gmond
>>>>>
>>>>>> conf file.
>>>>>>>>
>>>>>>>>
>>>>>>> The default multicast address seems to be 239.2.11.71, so  
>>>>>>> that's the
>>>>>>> one for your hadoop-metrics.properties.
>>>>>>>
>>>>>>>
>>>>>> Yup, try that - although I could tell better if I had Tamir's
>>>>>>
>>>>> gmond.conf,
>>>>
>>>>> of course.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Wouldn't using the multicast address mean I'll need to specify a
>>>>>>>
>>>>>>>> different
>>>>>>>> address for each node so that the data won't get to all nodes  
>>>>>>>> running
>>>>>>>> gmond
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> The design of Ganglia is such that all the data goes at all  
>>>>>>> the nodes
>>>>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>>>>>
>>>>> non-multicast
>>>>
>>>>> TCP
>>>>>
>>>>>> channels.
>>>>>>
>>>>>> For reference, our 200 node cluster has about 250KB/s of  
>>>>>> background
>>>>>>
>>>>> chatter
>>>>>
>>>>>> on idle nodes, which is probably Ganglia-related.  It's an  
>>>>>> incredibly
>>>>>>
>>>>> small
>>>>>
>>>>>> perturbation on network traffic.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> I'm not an expert, either --- I'm using the same multicast  
>>>>>> address on
>>>>>>
>>>>>>> all nodes in my cluster. On each node, tcpdump shows incoming  
>>>>>>> Ganglia
>>>>>>> traffic from every other node to the multicast address. It's  
>>>>>>> usually a
>>>>>>> burst of about  200 UDP packets every 4 seconds or so (for a 6- 
>>>>>>> node
>>>>>>> cluster), so the traffic overhead should be negligible.
>>>>>>>
>>>>>>> C
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Alpha Chapters of my book on Hadoop are available
>>>> http://www.apress.com/book/view/9781430219422
>>>>
>>>>
>>


Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi Brian,

I see GangliaContext31.class in the jar and GangliaContext31.java in the src
folder.

By the way, I only used the last version of each patch. Should I apply the
different files per patch from the earliest to the latest ?

Thanks,
Tamir

On Thu, Mar 19, 2009 at 1:38 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Tamir,
>
> Can you see the file GangliaContext31.java in your jar?  In the source
> directory?
>
> Brian
>
>
> On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:
>
>  Hi,
>>
>> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to fix
>> the metric names provided by hadoop and it worked. Because I had to
>> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use the
>> latest Ganglia (3.1). After changing the metrics file to report with the
>> GangliaContext31 class I started getting a ClassNotFoundException. The
>> command I used to recompile hadoop was "ant clean jar" and then I moved
>> and
>> renamed it instead of the original core jar.
>>
>> Do you what is wrong ?
>>
>> Thanks,
>> Tamir
>>
>>
>> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <jason.hadoop@gmail.com
>> >wrote:
>>
>>  Make all of your hadoop-metrics properties use the standard IP address of
>>> your master node.
>>> Then add a straight udp receive block to the gmond.conf of your master
>>> node.
>>> Then point your gmetad.conf at your master node.
>>>
>>> There are complete details in forthcoming book, and with this in it,
>>> should
>>> be available in alpha soon.
>>>
>>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>>> wrote:
>>>
>>>  I sent my gmond.conf in my previous email... and the address is like
>>>>
>>> carlos
>>>
>>>> wrote.
>>>>
>>>> I'll change the hadoop-metrics file and check again.
>>>> However, I would prefer to use a method I'm more familiar with - like
>>>> unicast tcp communication. Do you know what I need to change in ganglia
>>>>
>>> and
>>>
>>>> / or hadoop to use it ?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>>
>>>>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>>>
>>>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  I don't know too much about multicast... and I'm using the default
>>>>>>>
>>>>>> gmond
>>>>
>>>>> conf file.
>>>>>>>
>>>>>>>
>>>>>> The default multicast address seems to be 239.2.11.71, so that's the
>>>>>> one for your hadoop-metrics.properties.
>>>>>>
>>>>>>
>>>>> Yup, try that - although I could tell better if I had Tamir's
>>>>>
>>>> gmond.conf,
>>>
>>>> of course.
>>>>>
>>>>>
>>>>>
>>>>>> Wouldn't using the multicast address mean I'll need to specify a
>>>>>>
>>>>>>> different
>>>>>>> address for each node so that the data won't get to all nodes running
>>>>>>> gmond
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>  The design of Ganglia is such that all the data goes at all the nodes
>>>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>>>>>
>>>> non-multicast
>>>
>>>> TCP
>>>>
>>>>> channels.
>>>>>
>>>>> For reference, our 200 node cluster has about 250KB/s of background
>>>>>
>>>> chatter
>>>>
>>>>> on idle nodes, which is probably Ganglia-related.  It's an incredibly
>>>>>
>>>> small
>>>>
>>>>> perturbation on network traffic.
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>> I'm not an expert, either --- I'm using the same multicast address on
>>>>>
>>>>>> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
>>>>>> traffic from every other node to the multicast address. It's usually a
>>>>>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>>>>>> cluster), so the traffic overhead should be negligible.
>>>>>>
>>>>>> C
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Alpha Chapters of my book on Hadoop are available
>>> http://www.apress.com/book/view/9781430219422
>>>
>>>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Tamir,

Can you see the file GangliaContext31.java in your jar?  In the source  
directory?

Brian

On Mar 19, 2009, at 2:33 AM, Tamir Kamara wrote:

> Hi,
>
> All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch  
> to fix
> the metric names provided by hadoop and it worked. Because I had to
> recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to  
> use the
> latest Ganglia (3.1). After changing the metrics file to report with  
> the
> GangliaContext31 class I started getting a ClassNotFoundException. The
> command I used to recompile hadoop was "ant clean jar" and then I  
> moved and
> renamed it instead of the original core jar.
>
> Do you what is wrong ?
>
> Thanks,
> Tamir
>
>
> On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop  
> <ja...@gmail.com>wrote:
>
>> Make all of your hadoop-metrics properties use the standard IP  
>> address of
>> your master node.
>> Then add a straight udp receive block to the gmond.conf of your  
>> master
>> node.
>> Then point your gmetad.conf at your master node.
>>
>> There are complete details in forthcoming book, and with this in  
>> it, should
>> be available in alpha soon.
>>
>> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>>> I sent my gmond.conf in my previous email... and the address is like
>> carlos
>>> wrote.
>>>
>>> I'll change the hadoop-metrics file and check again.
>>> However, I would prefer to use a method I'm more familiar with -  
>>> like
>>> unicast tcp communication. Do you know what I need to change in  
>>> ganglia
>> and
>>> / or hadoop to use it ?
>>>
>>> Thanks.
>>>
>>>
>>> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>>> wrote:
>>>
>>>>
>>>> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>>>>
>>>> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I don't know too much about multicast... and I'm using the  
>>>>>> default
>>> gmond
>>>>>> conf file.
>>>>>>
>>>>>
>>>>> The default multicast address seems to be 239.2.11.71, so that's  
>>>>> the
>>>>> one for your hadoop-metrics.properties.
>>>>>
>>>>
>>>> Yup, try that - although I could tell better if I had Tamir's
>> gmond.conf,
>>>> of course.
>>>>
>>>>
>>>>>
>>>>> Wouldn't using the multicast address mean I'll need to specify a
>>>>>> different
>>>>>> address for each node so that the data won't get to all nodes  
>>>>>> running
>>>>>> gmond
>>>>>>
>>>>>
>>>>>
>>>> The design of Ganglia is such that all the data goes at all the  
>>>> nodes
>>>> running gmond.  If you don't like it, Ganglia 3.1 supports
>> non-multicast
>>> TCP
>>>> channels.
>>>>
>>>> For reference, our 200 node cluster has about 250KB/s of background
>>> chatter
>>>> on idle nodes, which is probably Ganglia-related.  It's an  
>>>> incredibly
>>> small
>>>> perturbation on network traffic.
>>>>
>>>> Brian
>>>>
>>>>
>>>> I'm not an expert, either --- I'm using the same multicast  
>>>> address on
>>>>> all nodes in my cluster. On each node, tcpdump shows incoming  
>>>>> Ganglia
>>>>> traffic from every other node to the multicast address. It's  
>>>>> usually a
>>>>> burst of about  200 UDP packets every 4 seconds or so (for a 6- 
>>>>> node
>>>>> cluster), so the traffic overhead should be negligible.
>>>>>
>>>>> C
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>


Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

All my testing were fine with Ganglia 3.0, I used HADOOP-3422 patch to fix
the metric names provided by hadoop and it worked. Because I had to
recompile hadoop (base 0.18.3) I also used Hadoop-4675 in order to use the
latest Ganglia (3.1). After changing the metrics file to report with the
GangliaContext31 class I started getting a ClassNotFoundException. The
command I used to recompile hadoop was "ant clean jar" and then I moved and
renamed it instead of the original core jar.

Do you what is wrong ?

Thanks,
Tamir


On Tue, Mar 17, 2009 at 5:25 PM, jason hadoop <ja...@gmail.com>wrote:

> Make all of your hadoop-metrics properties use the standard IP address of
> your master node.
> Then add a straight udp receive block to the gmond.conf of your master
> node.
> Then point your gmetad.conf at your master node.
>
> There are complete details in forthcoming book, and with this in it, should
> be available in alpha soon.
>
> On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com>
> wrote:
>
> > I sent my gmond.conf in my previous email... and the address is like
> carlos
> > wrote.
> >
> > I'll change the hadoop-metrics file and check again.
> > However, I would prefer to use a method I'm more familiar with - like
> > unicast tcp communication. Do you know what I need to change in ganglia
> and
> > / or hadoop to use it ?
> >
> > Thanks.
> >
> >
> > On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
> > >wrote:
> >
> > >
> > > On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
> > >
> > >  On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
> > >> wrote:
> > >>
> > >>> I don't know too much about multicast... and I'm using the default
> > gmond
> > >>> conf file.
> > >>>
> > >>
> > >> The default multicast address seems to be 239.2.11.71, so that's the
> > >> one for your hadoop-metrics.properties.
> > >>
> > >
> > > Yup, try that - although I could tell better if I had Tamir's
> gmond.conf,
> > > of course.
> > >
> > >
> > >>
> > >>  Wouldn't using the multicast address mean I'll need to specify a
> > >>> different
> > >>> address for each node so that the data won't get to all nodes running
> > >>> gmond
> > >>>
> > >>
> > >>
> > > The design of Ganglia is such that all the data goes at all the nodes
> > > running gmond.  If you don't like it, Ganglia 3.1 supports
> non-multicast
> > TCP
> > > channels.
> > >
> > > For reference, our 200 node cluster has about 250KB/s of background
> > chatter
> > > on idle nodes, which is probably Ganglia-related.  It's an incredibly
> > small
> > > perturbation on network traffic.
> > >
> > > Brian
> > >
> > >
> > >  I'm not an expert, either --- I'm using the same multicast address on
> > >> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
> > >> traffic from every other node to the multicast address. It's usually a
> > >> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
> > >> cluster), so the traffic overhead should be negligible.
> > >>
> > >> C
> > >>
> > >
> > >
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Monitoring with Ganglia

Posted by jason hadoop <ja...@gmail.com>.
Make all of your hadoop-metrics properties use the standard IP address of
your master node.
Then add a straight udp receive block to the gmond.conf of your master node.
Then point your gmetad.conf at your master node.

There are complete details in forthcoming book, and with this in it, should
be available in alpha soon.

On Tue, Mar 17, 2009 at 8:23 AM, Tamir Kamara <ta...@gmail.com> wrote:

> I sent my gmond.conf in my previous email... and the address is like carlos
> wrote.
>
> I'll change the hadoop-metrics file and check again.
> However, I would prefer to use a method I'm more familiar with - like
> unicast tcp communication. Do you know what I need to change in ganglia and
> / or hadoop to use it ?
>
> Thanks.
>
>
> On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bbockelm@cse.unl.edu
> >wrote:
>
> >
> > On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
> >
> >  On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
> >> wrote:
> >>
> >>> I don't know too much about multicast... and I'm using the default
> gmond
> >>> conf file.
> >>>
> >>
> >> The default multicast address seems to be 239.2.11.71, so that's the
> >> one for your hadoop-metrics.properties.
> >>
> >
> > Yup, try that - although I could tell better if I had Tamir's gmond.conf,
> > of course.
> >
> >
> >>
> >>  Wouldn't using the multicast address mean I'll need to specify a
> >>> different
> >>> address for each node so that the data won't get to all nodes running
> >>> gmond
> >>>
> >>
> >>
> > The design of Ganglia is such that all the data goes at all the nodes
> > running gmond.  If you don't like it, Ganglia 3.1 supports non-multicast
> TCP
> > channels.
> >
> > For reference, our 200 node cluster has about 250KB/s of background
> chatter
> > on idle nodes, which is probably Ganglia-related.  It's an incredibly
> small
> > perturbation on network traffic.
> >
> > Brian
> >
> >
> >  I'm not an expert, either --- I'm using the same multicast address on
> >> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
> >> traffic from every other node to the multicast address. It's usually a
> >> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
> >> cluster), so the traffic overhead should be negligible.
> >>
> >> C
> >>
> >
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
I sent my gmond.conf in my previous email... and the address is like carlos
wrote.

I'll change the hadoop-metrics file and check again.
However, I would prefer to use a method I'm more familiar with - like
unicast tcp communication. Do you know what I need to change in ganglia and
/ or hadoop to use it ?

Thanks.


On Tue, Mar 17, 2009 at 5:16 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

>
> On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:
>
>  On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>>> I don't know too much about multicast... and I'm using the default gmond
>>> conf file.
>>>
>>
>> The default multicast address seems to be 239.2.11.71, so that's the
>> one for your hadoop-metrics.properties.
>>
>
> Yup, try that - although I could tell better if I had Tamir's gmond.conf,
> of course.
>
>
>>
>>  Wouldn't using the multicast address mean I'll need to specify a
>>> different
>>> address for each node so that the data won't get to all nodes running
>>> gmond
>>>
>>
>>
> The design of Ganglia is such that all the data goes at all the nodes
> running gmond.  If you don't like it, Ganglia 3.1 supports non-multicast TCP
> channels.
>
> For reference, our 200 node cluster has about 250KB/s of background chatter
> on idle nodes, which is probably Ganglia-related.  It's an incredibly small
> perturbation on network traffic.
>
> Brian
>
>
>  I'm not an expert, either --- I'm using the same multicast address on
>> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
>> traffic from every other node to the multicast address. It's usually a
>> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
>> cluster), so the traffic overhead should be negligible.
>>
>> C
>>
>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Mar 17, 2009, at 10:08 AM, Carlos Valiente wrote:

> On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com>  
> wrote:
>> I don't know too much about multicast... and I'm using the default  
>> gmond
>> conf file.
>
> The default multicast address seems to be 239.2.11.71, so that's the
> one for your hadoop-metrics.properties.

Yup, try that - although I could tell better if I had Tamir's  
gmond.conf, of course.

>
>
>> Wouldn't using the multicast address mean I'll need to specify a  
>> different
>> address for each node so that the data won't get to all nodes  
>> running gmond
>

The design of Ganglia is such that all the data goes at all the nodes  
running gmond.  If you don't like it, Ganglia 3.1 supports non- 
multicast TCP channels.

For reference, our 200 node cluster has about 250KB/s of background  
chatter on idle nodes, which is probably Ganglia-related.  It's an  
incredibly small perturbation on network traffic.

Brian

> I'm not an expert, either --- I'm using the same multicast address on
> all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
> traffic from every other node to the multicast address. It's usually a
> burst of about  200 UDP packets every 4 seconds or so (for a 6-node
> cluster), so the traffic overhead should be negligible.
>
> C


Re: Monitoring with Ganglia

Posted by Carlos Valiente <su...@gmail.com>.
On Tue, Mar 17, 2009 at 14:56, Tamir Kamara <ta...@gmail.com> wrote:
> I don't know too much about multicast... and I'm using the default gmond
> conf file.

The default multicast address seems to be 239.2.11.71, so that's the
one for your hadoop-metrics.properties.

> Wouldn't using the multicast address mean I'll need to specify a different
> address for each node so that the data won't get to all nodes running gmond

I'm not an expert, either --- I'm using the same multicast address on
all nodes in my cluster. On each node, tcpdump shows incoming Ganglia
traffic from every other node to the multicast address. It's usually a
burst of about  200 UDP packets every 4 seconds or so (for a 6-node
cluster), so the traffic overhead should be negligible.

C

Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
file metrics work fine.

I don't know too much about multicast... and I'm using the default gmond
conf file.
Wouldn't using the multicast address mean I'll need to specify a different
address for each node so that the data won't get to all nodes running gmond
?


On Tue, Mar 17, 2009 at 4:46 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Yup, that's the next question: what's your recv channel in gmond.conf on
> that node?  You can just send along the whole gmond.conf if you're not sure.
>
> If you set the metrics to be logged to a file, do they appear there?  I.e.,
> have you verified the metrics are working at all for the node?
>
> Brian
>
>
> On Mar 17, 2009, at 9:39 AM, Carlos Valiente wrote:
>
>  On Tue, Mar 17, 2009 at 14:06, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>>> My hadoop-metrics looks like this:
>>>
>>> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>>> dfs.period=10
>>> dfs.servers=localhost:8649
>>>
>>> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>>> mapred.period=10
>>> mapred.servers=localhost:8649
>>>
>>> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>>> jvm.period=10
>>> jvm.servers=localhost:8649
>>>
>>
>> I'm using the following:
>>
>> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> dfs.period=10
>> dfs.servers=239.2.11.42:8649
>>
>> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> mapred.period=10
>> mapred.servers=239.2.11.42:8649
>>
>> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> jvm.period=10
>> jvm.servers=239.2.11.42:8649
>>
>> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> rpc.period=10
>> rpc.servers=239.2.11.42:8649
>>
>> (That is, I'm using the multicasting address as specified in my
>> gmond.conf:
>> [..]
>>
>> udp_send_channel {
>>  mcast_join = 239.2.11.42
>>  mcast_if = eth1
>>  port = 8649
>>  ttl = 1
>> }
>>
>> udp_recv_channel {
>>  mcast_join = 239.2.11.42
>>  mcast_if = eth1
>>  port = 8649
>>  bind = 239.2.11.42
>> }
>>
>> [..])
>>
>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Yup, that's the next question: what's your recv channel in gmond.conf  
on that node?  You can just send along the whole gmond.conf if you're  
not sure.

If you set the metrics to be logged to a file, do they appear there?   
I.e., have you verified the metrics are working at all for the node?

Brian

On Mar 17, 2009, at 9:39 AM, Carlos Valiente wrote:

> On Tue, Mar 17, 2009 at 14:06, Tamir Kamara <ta...@gmail.com>  
> wrote:
>> My hadoop-metrics looks like this:
>>
>> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> dfs.period=10
>> dfs.servers=localhost:8649
>>
>> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> mapred.period=10
>> mapred.servers=localhost:8649
>>
>> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
>> jvm.period=10
>> jvm.servers=localhost:8649
>
> I'm using the following:
>
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=239.2.11.42:8649
>
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=239.2.11.42:8649
>
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=239.2.11.42:8649
>
> rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> rpc.period=10
> rpc.servers=239.2.11.42:8649
>
> (That is, I'm using the multicasting address as specified in my  
> gmond.conf:
> [..]
>
> udp_send_channel {
>  mcast_join = 239.2.11.42
>  mcast_if = eth1
>  port = 8649
>  ttl = 1
> }
>
> udp_recv_channel {
>  mcast_join = 239.2.11.42
>  mcast_if = eth1
>  port = 8649
>  bind = 239.2.11.42
> }
>
> [..])


Re: Monitoring with Ganglia

Posted by Carlos Valiente <su...@gmail.com>.
On Tue, Mar 17, 2009 at 14:06, Tamir Kamara <ta...@gmail.com> wrote:
> My hadoop-metrics looks like this:
>
> dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> dfs.period=10
> dfs.servers=localhost:8649
>
> mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> mapred.period=10
> mapred.servers=localhost:8649
>
> jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
> jvm.period=10
> jvm.servers=localhost:8649

I'm using the following:

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=239.2.11.42:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=239.2.11.42:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=239.2.11.42:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
rpc.period=10
rpc.servers=239.2.11.42:8649

(That is, I'm using the multicasting address as specified in my gmond.conf:
[..]

udp_send_channel {
  mcast_join = 239.2.11.42
  mcast_if = eth1
  port = 8649
  ttl = 1
}

udp_recv_channel {
  mcast_join = 239.2.11.42
  mcast_if = eth1
  port = 8649
  bind = 239.2.11.42
}

[..])

Re: Monitoring with Ganglia

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

I found the link after hours of going through Google search results...

My hadoop-metrics looks like this:

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=localhost:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=localhost:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=localhost:8649

I use localhost because gmond runs on that machine.
gmond xml output is attached.


Thanks,
Tamir


On Tue, Mar 17, 2009 at 3:55 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey Tamir,
>
> I assume you want something like this:
>
>
> http://rcf.unl.edu/ganglia/?c=red-workers&h=node155&m=load_one&r=hour&s=descending&hc=4
>
> (That link's old - where'd you find it?  I'll update it...)
>
> Can you send out the relevant lines from the hadoop-metrics file?
>
> Also, can you do the following
>
> telnet (Ganglia host) (Ganglia port)
>
> This should spew out lots of XML data; use the host and port you configured
> hadoop with.
>
> Brian
>
>
> On Mar 17, 2009, at 8:48 AM, Tamir Kamara wrote:
>
>  Hi,
>>
>> For a few days I'm trying to make hadoop work with the Ganglia monitoring
>> software.
>> I'm using hadoop 0.18.3 with ganglia 3.0.6, I've changed the
>> hadoop-metrics
>> file as described in the wiki and also used HADOOP-3422 patch.Now, I can
>> only see system metrics in the ganglia data and nothing about hadoop
>> itself.
>>
>> I also tried to add a collection group to gmond.conf for metric
>> mapred.tasktracker.mapTaskSlots, but that caused gmond to stop working
>> because it couldn't "collect the metric on the platform" which means that
>> it
>> doesn't recognize the metric.
>> It should be possible to do this like in
>> http://rcf.unl.edu/ganglia/?c=red.
>>
>> There're some posts of this issue but I couldn't find any answer or
>> detailed
>> description of how to monitor hadoop with ganglia.
>>
>> Does anyone have any experience with this ?
>>
>> Thanks,
>> Tamir
>>
>
>

Re: Monitoring with Ganglia

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Tamir,

I assume you want something like this:

http://rcf.unl.edu/ganglia/?c=red-workers&h=node155&m=load_one&r=hour&s=descending&hc=4

(That link's old - where'd you find it?  I'll update it...)

Can you send out the relevant lines from the hadoop-metrics file?

Also, can you do the following

telnet (Ganglia host) (Ganglia port)

This should spew out lots of XML data; use the host and port you  
configured hadoop with.

Brian

On Mar 17, 2009, at 8:48 AM, Tamir Kamara wrote:

> Hi,
>
> For a few days I'm trying to make hadoop work with the Ganglia  
> monitoring
> software.
> I'm using hadoop 0.18.3 with ganglia 3.0.6, I've changed the hadoop- 
> metrics
> file as described in the wiki and also used HADOOP-3422 patch.Now, I  
> can
> only see system metrics in the ganglia data and nothing about hadoop  
> itself.
>
> I also tried to add a collection group to gmond.conf for metric
> mapred.tasktracker.mapTaskSlots, but that caused gmond to stop working
> because it couldn't "collect the metric on the platform" which means  
> that it
> doesn't recognize the metric.
> It should be possible to do this like in http://rcf.unl.edu/ganglia/?c=red 
> .
>
> There're some posts of this issue but I couldn't find any answer or  
> detailed
> description of how to monitor hadoop with ganglia.
>
> Does anyone have any experience with this ?
>
> Thanks,
> Tamir