You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/21 12:59:23 UTC

[Hadoop Wiki] Trivial Update of "ZooKeeper/GSoCMonitoringAndWebInterface" by AndreiSavu

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/GSoCMonitoringAndWebInterface" page has been changed by AndreiSavu.
http://wiki.apache.org/hadoop/ZooKeeper/GSoCMonitoringAndWebInterface?action=diff&rev1=5&rev2=6

--------------------------------------------------

   * Assigned mentor: Patrick Hunt (phunt at apache dot org)
  
  == Abstract ==
- ZooKeeper is a complex distributed system. Understanding how well it is running is tremendously important. Patrick Hunt has created a [[http://github.com/phunt/zookeeper_dashboard|Django-based dashboard]] that allows some insight into how ZooKeeper is running. This is the foundation I'm going to build on. This project would capture much more information from ZooKeeper, adding hooks to retrieve it where necessary and visualize it in a appealing and useful way. I'm also going to provide a bunch of monitoring recipes for systems like: Ganglia, Nagios, Cacti.
+ ZooKeeper is a complex distributed system. Understanding how well it is running is tremendously important. Patrick Hunt has created a [[http://github.com/phunt/zookeeper_dashboard|Django-based dashboard]] that allows some insight into how ZooKeeper is running. This is the foundation I'm going to build on. This project would capture much more information from ZooKeeper, adding hooks to retrieve it where necessary and visualize it in an appealing and useful way. I'm also going to provide a bunch of monitoring recipes for systems like: Ganglia, Nagios, Cacti.
  
  == Work In Progress ==
-  * monitoring for Cacti and Ganglia
-  * commit as zookeeper-monitoring as a contrib
+  * cleanup and add more tests on zookeeper-monitoring
+  * submit [[http://github.com/andreisavu/zookeeper-monitoring|zookeeper-monitoring]] as a contrib
+   * going to add a new JIRA for monitoring tools
+   * right now there is only one JIRA opened for Ganglia [[https://issues.apache.org/jira/browse/ZOOKEEPER-613|ZOOKEEPER-613]]
   * [[https://issues.apache.org/jira/browse/ZOOKEEPER-175|ZOOKEEPER-175]]
   * [[https://issues.apache.org/jira/browse/ZOOKEEPER-757|ZOOKEEPER-757]]
   * [[https://issues.apache.org/jira/browse/ZOOKEEPER-613|ZOOKEEPER-613]]
  
  == Done ==
-  * monitoring tools and recipes: [[http://github.com/andreisavu/zookeeper-monitoring|zookeeper-monitoring]] : Nagios
+  * monitoring tools and recipes: [[http://github.com/andreisavu/zookeeper-monitoring|zookeeper-monitoring]] : Nagios, Cacti and Ganglia
   * [[https://issues.apache.org/jira/browse/ZOOKEEPER-744|ZOOKEEPER-744]]
  
  == Milestones ==
  === Community Bonding (starts: 26 April ends: 24 May) ===
  Activities:
  
-  * read mail lists archives
+  * read mail lists archives - '''done'''
-  * read source code
+  * read source code- '''done'''
-  * discuss with the community members  (monitoring and administration requirements, production stories)
+  * discuss with the community members  (monitoring and administration requirements, production stories) - '''done'''
-  * discuss with the Adobe Hadoop / Hbase team about their specific monitoring requirements
+  * discuss with the Adobe Hadoop / Hbase team about their specific monitoring requirements - '''done'''
  
  Expected results:
  
-  * understand source code and the known bugs
+  * understand source code and the known bugs - '''done'''
-  * understand how the software is used in production
+  * understand how the software is used in production - '''done'''
+   * ZooKeeper is the kind of service that you put in production and forget about it
+   * got positive feedback: works as expected "out of the box"
+   * monitoring requirements: ensure that it keeps working as expected
-  * understand monitoring requirements
+  * understand monitoring requirements - '''done'''
-  * understand debugging requirements
+  * understand debugging requirements - '''done'''
-  * setup a development environment
+  * setup a development environment - '''done'''
+   * on the local machine running Ubuntu 9.10, java1.6, Eclipse, ant
+   * tracking my changes on github: http://github.com/andreisavu/zookeeper
  
  === Monitoring and Data Collection (starts: 24 May ends: 20 June ) ===
  Activities:
  
-  * deploy small scale (multinode) cluster for development (virtual machines)
+  * deploy small scale (multinode) cluster for development (virtual machines)  - '''done'''
+   * I've used [[http://github.com/phunt/zkconf|zkconf]] for this task. I've deployed local "clusters" with 3,5 and 9 nodes
-  * identify important health signals add hooks (if needed) for realtime data collection
+  * identify important health signals add hooks (if needed) for realtime data collection - '''done'''
+   * added new 4letterword 'mntr' for monitoring - going to be released in zookeeper 3.4.0
+   * important signals: latency, packets sent / received, outstanding requests, znode count, watch count, ephemerals count, followers count, synced followers, pending syncs, open file descriptor count
-  * create scripts / plugins for cluster monitoring using Cacti, Ganglia, Nagios, SNMP
+  * create scripts / plugins for cluster monitoring using Cacti, Ganglia, Nagios - '''done'''
-  * document script install procedures
+   * [[http://github.com/andreisavu/zookeeper-monitoring|zookeeper-monitoring]]
+  * document script install procedures - '''done''' (I'm making the assumption the user has previous experience configuring Nagios, Cacti or Ganglia)
-  * collaborate with the Adobe Hadoop / Hbase team and deploy the monitoring scripts in production
+  * collaborate with the Adobe Hadoop / Hbase team and deploy the monitoring scripts in production - '''work in progress'''
  
  Expected results:
  
-  * production ready scripts / plugins for monitoring
+  * production ready scripts / plugins for monitoring - '''done'''
-  * easy to understand and follow install guides
+  * easy to understand and follow install guides - '''done'''
  
  === Web Application (starts: 20 June ends: 9 august) ===
  Activities: