You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/07/01 19:22:24 UTC

[Hadoop Wiki] Update of "Chukwa Quick Start" by AriRabkin

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AriRabkin:
http://wiki.apache.org/hadoop/Chukwa_Quick_Start

------------------------------------------------------------------------------
  
  == Compiling and installing Chukwa ==
  
-  1. If Chukwa is in the hadoop contrib directory, you should be able to just say ''ant'' in the project root directory.
-  1. If you are building Chukwa in standalone mode, you should set the HADOOP_HOME environment variable to point to your HADOOP installation (e.g. > export HADOOP_HOME=/path/to/hadoop). This will ensure that Chukwa is built with the HDFS correct protocol version to be able to talk to your running Hadoop Distributed File System (HDFS).
-   
+  1. To compile Chukwa, just say ''ant'' in the project root directory.
+  1. Move the compiled jars from build to the Chukwa root directory.
+  1. 
  
  == Configuring and starting the Collector ==
  
   1. Copy conf/chukwa-collector-conf.xml.template to conf/chukwa-collector-conf.xml
+  1. Copy conf/chukwa-env.sh-template to conf/chukwa-env.sh.  
+  1. Edit chukwa-env.sh.  You almost certainly need to set JAVA_HOME, HADOOP_HOME, and HADOOP_CONF_DIR, at least. 
-  1. Edit the writer.hdfs.filesystem property to point to a real filesystem.
-  1. If you are running hadoop, this should be the path to the namenode.
-  1. If you are not running hadoop, you can just juse a local path, of the form file:///tmp/chukwa.
-  1. Copy conf/chukwa-env.sh-template to conf/chukwa-env.sh.  Set JAVA_HOME in file. (If this is a standalone Chukwa, also set HADOOP_HOME here).
-  1. Download the Apache logging commons jar from http://commons.apache.org/downloads/download_logging.cgi
-  1. Copy the 'commons-logging-X.Y.Z.jar' file to $HADOOP_HOME/src/contrib/chukwa/lib
   1. In the chukwa root directory, say ``bash bin/jettyCollector.sh''
  
  == Configuring and starting the Local Agent ==
@@ -27, +23 @@

   1. Copy conf/chukwa-agent-conf.xml.template to conf/chukwa-agent-conf.xml
   1. Copy conf/collectors.template to conf/collectors
   1. In the chukwa root directory, say ``bash bin/agent.sh''
+  1. Without other options, that will have the agent push data across to the collector; if you instead say bin/agent.sh local, that will cause the agent to just print to standard out.
  
  == Starting Adaptors ==
  The local agent speaks a simple text-based protocol, by default over port 9093.
@@ -34, +31 @@

  on localhost:
  
   1. Telnet to localhost 9093
-  1. Type [without quotation marks] "ADD CharFileTailerUTF8 MyFileType /path/to/file 0"
+  1. Type [without quotation marks] "ADD filetailer.CharFileTailingAdaptorUTF8 aDataType /path/to/file 0"
-  1. Chukwa internal Namenode's type is NameNodeType so for namenode log type (without quotation marks, make sure file exists): "add filetailer.CharFileTailingAdaptorUTF8NewLineEscaped SysLog 0 /var/log/logfile 0"
-  1. You should see: "OK add completed; new ID is 1 NameNodeType"
   1. Type "list" -- you should see the adaptor you just started, listed as running. 
   1. Type  "close" to break the connection.
   1. If you don't have telnet, you can get the same effect using the netcat (''nc'') command line tool. 
      
- == Configuring and starting the demux job ==
+ == Setting up data processing ==
+   See the Chukwa administration guide for instructions in setting up data processing.
  
-  1. Edit bin/chukwa-config.sh to match your system configuration
-  1. (Document TODO: Say exactly what should change.  Anything other than the tomcat environment variable?)
-  1. In the chukwa root directory, say ``bash bin/processSinkFiles.sh'' 
-  1. (Document TODO: This script has a hard-coded 54310 port.  Can you confirm that you must be running hdfs from 54310?)
- 
- == Running Chukwa on a Cluster ==
- The cluster deployment process is still under active development, thus it is possible that the following instructions may not work yet, but they will soon, so please don't delete them. Eventually, even the single machine setup (for newcomers to Chukwa who want to try it out of the box on their) above will be replaced by the below process, renaming the conf/slaves.template and conf/collectors.template files (to remove the .template suffix) for the defaults of localhost for the collector and agent.
- 
- '''Configure Chukwa'''
- (For an explanation of each configuration file in the conf directory, see ["Chukwa Configuration"])
- 
-  1. Specify your JAVA_HOME and HADOOP_HOME in conf/chukwa-env.sh
-  1. Specify which hosts to run collectors on in the conf/collectors file.
-  1. Like in Hadoop, you need to specify a set of nodes on which you want to run Chukwa agents (similar to conf/slaves in Hadoop) using a conf/slaves file. The local agents on each machine will also reference the conf/collectors file, selecting a collector at random from this list to talk to. Thus, like Hadoop, it is common to run Chukwa from a shared file system where all of the agents (i.e. slaves) can access the same conf files.
-  1. Setup the initial adaptors you want to run on every agent in the chukwa cluster by copying conf/initial_adaptors.template to conf/initial_adaptors and adding whichever adaptors you see fit. See the ["Chukwa Adaptors List"] for a catalog of pre-made adaptors.
- 
- '''Run Chukwa'''
-  1. run bin/start-all.sh to start all agents, collectors, data loaders, and schedule the demux MapReduce job to run every 5 minutes over top of the datasink. Use bin/stop-all.sh to shut everything down.
- 
- 
- '''OR''' run the collectors and agents independently (without other scripts running as part of startup and shutdown)
- 
-  1. Start the collectors in your cluster with the command <code>bin/start-collectors.sh</code> (use bin/stop-collectors.sh to shut them down).
-  1. Start the agents by running <code>bin/start-agents.sh</code> (use bin/stop-agents.sh to shut them down).
-