You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Anu Engineer (JIRA)" <ji...@apache.org> on 2015/04/25 01:02:39 UTC

[jira] [Commented] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics

    [ https://issues.apache.org/jira/browse/HADOOP-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511965#comment-14511965 ] 

Anu Engineer commented on HADOOP-11873:
---------------------------------------

I don't know if this is useful for you but HDFS does support

| `TotalWriteTime`| Total number of milliseconds spent on write operation |
| `TotalReadTime` | Total number of milliseconds spent on read operation |
| `RemoteBytesRead` | Number of bytes read by remote clients |
| `RemoteBytesWritten` | Number of bytes written by remote clients |

if you look around you should be able to see bytesRead and bytesWritten too. Please see Metrics.md for more information. This went as part of HDFS-7773



> Include disk read/write time in FileSystem.Statistics
> -----------------------------------------------------
>
>                 Key: HADOOP-11873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11873
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Kay Ousterhout
>            Priority: Minor
>
> Measuring the time spent blocking on reading / writing data from / to disk is very useful for debugging performance problems in applications that read data from Hadoop, and can give much more information (e.g., to reflect disk contention) than just knowing the total amount of data read.  I'd like to add something like "diskMillis" to FileSystem#Statistics to track this.
> For data read from HDFS, this can be done with very low overhead by adding logging around calls to RemoteBlockReader2.readNextPacket (because this reads larger chunks of data, the time added by the instrumentation is very small relative to the time to actually read the data).  For data written to HDFS, this can be done in DFSOutputStream.waitAndQueueCurrentPacket.
> As far as I know, if you want this information today, it is only currently accessible by turning on HTrace. It looks like HTrace can't be selectively enabled, so a user can't just turn on the tracing on RemoteBlockReader2.readNextPacket for example, and instead needs to turn on tracing everywhere (which then introduces a bunch of overhead -- so sampling is necessary).  It would be hugely helpful to have native metrics for time reading / writing to disk that are sufficiently low-overhead to be always on. (Please correct me if I'm wrong here about what's possible today!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)