You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2012/09/12 10:12:07 UTC

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453821#comment-13453821 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

The idea is to utilize HDFS positional read, which is defined by {{PositionedReadable}} and allows to read a segment of data from a given position.
I propose three variants of such benchmarks:
# *Random read*. Randomly choose an offset in the range [0, fileSize] and read one buffer of data from that random position. Repeat operation until a specified number of bytes is read. 
Random read can occasionally read the same bytes twice.
# *Backward read* reads file in reverse order.
This is intended to read all bytes of the given file, but avoid reading any of them twice.
# *Skip read*. Starting from the beginning read one buffer of data, then jump ahead, and read again. Repeat until either the specified number of bytes is read or the end of file is reached.
Skip read allows to avoid read-ahead. With sequential read data mostly comes from the system block cache. Jumping ahead far enough will ensure that bytes are actually read from the storage device.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira