You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Milind Bhandarkar (JIRA)" <ji...@apache.org> on 2006/05/20 00:43:29 UTC
[jira] Created: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Standard set of Performance Metrics for Hadoop
----------------------------------------------
Key: HADOOP-237
URL: http://issues.apache.org/jira/browse/HADOOP-237
Project: Hadoop
Type: Improvement
Components: metrics
Versions: 0.3
Environment: All
Reporter: Milind Bhandarkar
Assigned to: Milind Bhandarkar
I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
1. collect this list
2. assess feasibility of obtaining metric
3. assign context/record/metrics names
4. seek approval for names
5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Sameer Paranjpye updated HADOOP-237:
------------------------------------
Fix Version: 0.3
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Fix For: 0.3
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Resolved: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Doug Cutting resolved HADOOP-237.
---------------------------------
Fix Version/s: 0.5.0
Resolution: Fixed
I just committed this.
It would be great to get some, e.g., Ganglia screenshots up on the wiki demoing this stuff.
An extra thanks to Milind for being patient and persistent on this one!
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Fix For: 0.5.0
>
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: (was: hadoop-metrics.patch)
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: (was: hadoop-metrics.patch)
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: hadoop-metrics.patch
I have attached an updated metrics patch. I have addressed most concerns expressed by Doug and David, except for removing duplicate code. Because the alternative is an overkill. It would not allow localized code updates, compiler cannot inline so increases overhead, because of multiple exit points in some classes, it will leak metrics records and in any case the duplicate code is small and in private classes (3-lines each in 6 classes). Please let me know if you have any more concerns.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Doug Cutting updated HADOOP-237:
--------------------------------
Fix Version: (was: 0.3.0)
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: hadoop-metrics.patch
repatched after merging with trunk.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12421245 ]
Milind Bhandarkar commented on HADOOP-237:
------------------------------------------
agreed. will re-do the patch soon.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "David Bowen (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12420694 ]
David Bowen commented on HADOOP-237:
------------------------------------
This code is not using the metrics API as intended, in that it calls the update method after each metric modification. The API is record-oriented, so update copies the whole record to the client library.
I don't think that this will cause significant, observable problems with the metric data, but it could be a significant performance issue.
The preferred model would be to replace per-metric methods like
void mapInput(long numBytes)
void mapOutput(long numBytes)
with something like
void mapIO(long numBytesInput, long numBytesOutput)
and have this only call the update method once.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12421035 ]
Doug Cutting commented on HADOOP-237:
-------------------------------------
I still feel the duplicate could should be removed. Performance should not motivate this: HotSpot should be able to inline regardless. Nearly identical code is repeated in eight places, differing only in a constant string, which can easily be made a paramter. If local modifications are required, then the common implementation can be subclassed or not used, but currently there are effectively no local modifications.
Also, you're still ignoring rather than logging exceptions.
This code will become the prototypical use of the metrics API. We should make sure that it looks good. If boilerplate code is required for prototypical use, then that boilerplate should become utility methods and/or classes in the metrics package, and the metrics package-level javadoc should encourage this use. Insertion of metrics code should be minimally invasive.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "David Bowen (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12420710 ]
David Bowen commented on HADOOP-237:
------------------------------------
OK, maybe this is no big deal since the records are small. The idea of a record was to be a bunch of things that should be updated simultaneously, but maybe using it for a small number of things that are updated independently is OK. Splitting the record into two would cost a bit of extra space in the client library (since the overhead of an extra record in a hash table outweighs the savings of 4 bytes per record) and would not save much in the cost of an update.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12412607 ]
Milind Bhandarkar commented on HADOOP-237:
------------------------------------------
Yes, this sort of list is exactly what I had in mind.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: (was: hadoop-metrics.patch)
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12422058 ]
Doug Cutting commented on HADOOP-237:
-------------------------------------
Milind, sorry to be a stickler here, but I think this still needs a bit more work.
The MetricsUtil class should be in the metrics package and just be named Metrics. The create() method should be renamed createRecord(). This would make canonical use look something like:
MetricsRecord record = Metrics.createRecord("dfs", "datanode");
...
Metrics.report(record, "bytes-read", bytesRead);
Also, the javadoc in this class should be improved. It should reference the (very good) javadoc of MetricsRecord, MetricsContext, etc. This will be most folks first point of contact with the metrics package. Parameters and return types should be well documented (you can mostly cut and paste these from MetricsRecord).
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12412604 ]
Doug Cutting commented on HADOOP-237:
-------------------------------------
Here are some quick ideas for what could be interesting statistics to monitor:
Map Input:
records/second
bytes/second
Map Output Transferred to Reduce Node;
records/second
bytes/second
Reduce Output:
bytes/second
records/second
Job Tracker:
maps tasks launched
map tasks completed
reduce tasks launched
reduce tasks completed
DFS Datanode
bytes/second written
bytes/second read
blocks/second read
blocks/second written
blocks replicated
blocks removed
DFS NameNode
files created
files renamed
files listed
files opened
files removed
Is this the sort of thing you had in mind?
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: hadoop-metrics.patch
Attached a patch that uses hadoop metrics API and a properties file that defaults to null context but has examples of file and ganglia output contexts.
All metrics requested in the bug, except for output-bytes for reduce tasks have been included. The output-bytes required a change to recordwriter interface, and therefore has been postponed.
I have tested it with file and null context. Have not tested it with ganglia.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: hadoop-metrics.patch
Hopefully this patch addresses alll your concerns, Doug.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: hadoop-metrics.patch
Doug, I have incorporated the suggested changes, and attached the patch.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Jayant Shekhar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12412980 ]
Jayant Shekhar commented on HADOOP-237:
---------------------------------------
Here are a few I would like to add to the list:
Job Tracker
jobs submitted
jobs completed
Task Tracker
total tasks completed
number of maps running
number of reduces running
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12420673 ]
Milind Bhandarkar commented on HADOOP-237:
------------------------------------------
okay. I will make the necessary modifications and will resubmit the patch. For low overhead, though I would have at least two instances of MetricsReporter, one for each context, dfs, and mapred.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12420566 ]
Doug Cutting commented on HADOOP-237:
-------------------------------------
This mostly looks good to me.
The indentation is non-standard for Hadoop (using four spaces instead of two), many lines exceed 80 columns, the new package imports section is not separated by a blank line and in NameNode.java I think you add some unused fields.
More importantly, there seems to be a lot of duplicated code. It looks like we should probably add a MetricsReporter base class whose constructor constructs the context, record, etc. This should log rather than ignore exceptions. Then it can have reportMetric(String, int) and reportMetric(String, long) methods.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12421973 ]
Milind Bhandarkar commented on HADOOP-237:
------------------------------------------
sorry forgot to merge with trunk before diffing.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=all ]
Milind Bhandarkar updated HADOOP-237:
-------------------------------------
Attachment: (was: hadoop-metrics.patch)
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-237) Standard set of Performance Metrics
for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-237?page=comments#action_12420695 ]
Milind Bhandarkar commented on HADOOP-237:
------------------------------------------
But then, mapIO needs to be called everytime eiither numBytesInput or numBytesOutput change (they change in different places). So, I don't see a performance difference. Maybe having a different record for each metric is a better solution performance wise, because it will avoid the record-copy overhead, but then it will increase record update overhead, by sending two separate packets.
> Standard set of Performance Metrics for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-237
> URL: http://issues.apache.org/jira/browse/HADOOP-237
> Project: Hadoop
> Type: Improvement
> Components: metrics
> Versions: 0.3.0
> Environment: All
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Attachments: hadoop-metrics.patch
>
> I am starting to use Hadoop's shiny new Metrics API to publish performance (and other) Metrics of running jobs and other daemons.
> Which performance metrics are people interested in seeing ? If possible, please group them according to modules, such as map-reduce, dfs, general-cluster-related etc. I will follow this process:
> 1. collect this list
> 2. assess feasibility of obtaining metric
> 3. assign context/record/metrics names
> 4. seek approval for names
> 5. instrument the code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira