You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2011/01/01 03:00:45 UTC

[jira] Created: (CHUKWA-575) Cluster Summarization script

Cluster Summarization script
----------------------------

                 Key: CHUKWA-575
                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
             Project: Chukwa
          Issue Type: New Feature
          Components: scripts
         Environment: Java 6, Mac OS X 10.6
            Reporter: Eric Yang
            Assignee: Eric Yang
             Fix For: 0.5.0


 Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:

 * System
 *   Disk
 *   Memory
 *   Network
 * HDFS
 *   Name Node
 *   Data Node
 * Map Reduce
 *   Job Tracker
 *   Task Tracker

We can further analyze the data to provide a summary for the cluster as these categories:

 * System - Performance profile of how busy the nodes are in the cluster
 * HDFS - Capacity of the disk storage, and health of the data in the file system
 * MapReduce - Capacity of the processing pipeline, and health of the processing system


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-575:
-----------------------------

    Attachment: CHUKWA-575.patch

Usage:

pig -param START=[starting epoch millisecond] -Dpig.additional.jars=$PIG_PATH/pig.jar:$HBASE_HOME/hbase-0.20.6.jar $CHUKWA_HOME/script/pig/ClusterSummary.pig

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979203#action_12979203 ] 

Eric Yang commented on CHUKWA-575:
----------------------------------

My configuration for running hbase+pig:

Single node hadoop+hbase+chukwa:

{noformat}
export PIG_PATH=sandbox/pig-0.8.0
export PIG_CLASSPATH=${HBASE_CONF_DIR}:${HADOOP_CONF_DIR}
export HBASE_HOME=sandbox/hbase-0.20.6
export CHUKWA_HOME=sandbox/chukwa-trunk

./pig -Dpig.additional.jars=${PIG_PATH}/pig-0.8.0-core.jar:${HBASE_HOME}/hbase-0.20.6.jar ${CHUKWA_HOME}/script/pig/ClusterSummary.pig
{noformat}

Real cluster + Mapreduce:

create a jar file containing hbase-site.xml, and call it hbase-conf.jar

{noformat}
export PIG_PATH=sandbox/pig-0.8.0
export PIG_CLASSPATH=${HBASE_CONF_DIR}:${HADOOP_CONF_DIR}
export HBASE_HOME=sandbox/hbase-0.20.6
export CHUKWA_HOME=sandbox/chukwa-trunk

./pig -Dpig.additional.jars=${PIG_PATH}/pig-0.8.0-core.jar:${HBASE_HOME}/hbase-0.20.6.jar:hbase-conf.jar ${CHUKWA_HOME}/script/pig/ClusterSummary.pig
{noformat}

See if you can get this to work, try to run the script in grunt mode line by line, and inspect which STORE statement is causing problem.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976378#action_12976378 ] 

Eric Yang commented on CHUKWA-575:
----------------------------------

This script aggregates:

* cpu, disk, memory, network usage
* hdfs space capacity, hdfs space remaining, errors status
* mapreduce slot capacity, slot usage, error status

by cluster, and store in ClusterSummary table.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-575) Cluster Summarization script

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977087#action_12977087 ] 

Ari Rabkin commented on CHUKWA-575:
-----------------------------------

Tried this, got errors.  

I started with a clean HBase, let it collect metrics from the default adaptors for a bit. Ran the script manually. The Pig-spawned tasks all fail. I got the following on the Reduce side:

java.io.IOException: java.lang.IllegalArgumentException: Row key is invalid
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:438)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.IllegalArgumentException: Row key is invalid
	at org.apache.hadoop.hbase.client.Put.(Put.java:79)
	at org.apache.hadoop.hbase.client.Put.(Put.java:69)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:355)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
	at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:436)
	... 7 more


------

Is it possible to run the scripts in local mode for debugging, and have them still pull data from HBase? How do I configure that? I tried a bunch of things and got nowhere.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-575:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-575:
-----------------------------

    Status: Patch Available  (was: Open)

Since CHUKWA-578 is invalid, resubmit patch status again.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-575) Cluster Summarization script

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated CHUKWA-575:
------------------------------

    Status: Open  (was: Patch Available)

CHUKWA-578 blocks this.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-575) Cluster Summarization script

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-575:
-----------------------------

    Status: Patch Available  (was: Open)

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-575) Cluster Summarization script

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976965#action_12976965 ] 

Ari Rabkin commented on CHUKWA-575:
-----------------------------------

Looks OK. +1 to commit.  I will test on my cluster.

> Cluster Summarization script
> ----------------------------
>
>                 Key: CHUKWA-575
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-575
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: scripts
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-575.patch
>
>
>  Chukwa record metrics from name node, data node, job tracker, task tracker, etc, but the raw metrics does not help determine all aspect of the cluster health.  For now, we have the following metrics in HBase:
>  * System
>  *   Disk
>  *   Memory
>  *   Network
>  * HDFS
>  *   Name Node
>  *   Data Node
>  * Map Reduce
>  *   Job Tracker
>  *   Task Tracker
> We can further analyze the data to provide a summary for the cluster as these categories:
>  * System - Performance profile of how busy the nodes are in the cluster
>  * HDFS - Capacity of the disk storage, and health of the data in the file system
>  * MapReduce - Capacity of the processing pipeline, and health of the processing system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.