You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2008/08/07 04:45:44 UTC
[jira] Updated: (HADOOP-3062) Need to capture the metrics for the
network ios generate by dfs reads/writes and map/reduce shuffling and
break them down by racks
[ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated HADOOP-3062:
----------------------------------
Attachment: 3062-0.patch
First draft.
Format:
{noformat}
<log4j schema including timestamp, etc.> src: <src IP>, dest: <dst IP>, bytes: <bytes>, op: <op enum>, id: <DFSClient id|taskid>[, blockid: <block id>]
{noformat}
The patch adds the DFSClient clientName to OP_READ_BLOCK and changes the String in OP_WRITE_BLOCK from the path- which is unused- to the clientName. Is this is set to DFSClient_<taskid> in map and reduce tasks, tracing the output of a job should be straightforward after some processing of each entry. Writes for replications (where the clientName is "") are logged as they have been; the logging in PacketResponder has been reformatted to fit the preceding schema. A few known issues:
* The logging assumes the IP address is sufficient to distinguish a source, particularly for writes and in the shuffle
* This logs to the DataNode and ReduceTask appenders; these entries should be directed elsewhere and disabled by default
* In testing this, some entries in the read exhibited a strange property: the source and destination match, but neither matches the DataNode on which it is logged. I'm clearly missing something.
I tried tracing a few blocks and map outputs through the logs and all made sense. That said- as mentioned in the last bullet- not all of the entries made sense.
> Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3062
> URL: https://issues.apache.org/jira/browse/HADOOP-3062
> Project: Hadoop Core
> Issue Type: Improvement
> Components: metrics
> Reporter: Runping Qi
> Attachments: 3062-0.patch
>
>
> In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
> what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
> bandwidth is the bottleneck when certain jobs are running on a cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.