You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2012/09/12 09:49:08 UTC

[jira] [Created] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Konstantin Shvachko created MAPREDUCE-4651:
----------------------------------------------

             Summary: Benchmarking random reads with DFSIO
                 Key: MAPREDUCE-4651
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: benchmarks, test
    Affects Versions: 1.0.0
            Reporter: Konstantin Shvachko


TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463267#comment-13463267 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #2772 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2772/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. (Revision 1390159)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390159
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-4651:
-------------------------------------------

    Status: Patch Available  (was: Open)
    
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463828#comment-13463828 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1208 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1208/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. (Revision 1390159)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390159
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Ravi Prakash (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455895#comment-13455895 ] 

Ravi Prakash commented on MAPREDUCE-4651:
-----------------------------------------

Thanks Konstantin! I applied the patch and ran the random and backward read tests on my single node dev box.

{noformat}
$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -random -fileSize 10MB 
Average IO rate mb/sec: 134.43310546875
IO rate std deviation: 0.00896365222201456

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -backward -fileSize 10MB
Average IO rate mb/sec: 134.49253845214844
IO rate std deviation: 0.026679629420752023

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -random -fileSize 1GB
Average IO rate mb/sec: 249.47183227539062
IO rate std deviation: 0.014617091655162118

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -backward -fileSize 1GB
Average IO rate mb/sec: 295.8538818359375
IO rate std deviation: 0.061419808441541615

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -random -fileSize 10GB
Average IO rate mb/sec: 320.3417663574219
IO rate std deviation: 0.05935480659067817

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -backward -fileSize 10GB
Average IO rate mb/sec: 323.28045654296875
IO rate std deviation: 0.0598550775330073

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -backward -fileSize 30GB
Average IO rate mb/sec: 390.9880065917969
IO rate std deviation: 0.06083891027478396

$HADOOP_PREFIX/bin/hadoop org.apache.hadoop.fs.TestDFSIO -read -random -fileSize 30GB
Average IO rate mb/sec: 369.2136535644531
IO rate std deviation: 0.056819116587427144
{noformat}

Could you please post recommended usage? And at what sizes do we expect to achieve stable IO rates?

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463764#comment-13463764 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #386 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/386/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. Contributed by Konstantin Shvachko. (Revision 1390164)

     Result = UNSTABLE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390164
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-4651:
-------------------------------------------

    Attachment: randomDFSIO.patch

The patch
# Introduces the three types of random reads.
# It also adds getIOStream() method, which excludes stream construction from the timed part of the execution. This is important for small writes and reads.
And this let me move all compression functionality from IOMapperBase to TestDFSIO, where it truly belongs.
# Converted to JUnit 4 format. Finally. And added test cases for new benchmarks.
# Fixed couple of warnings and removed unnecessary generic parameters.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko reassigned MAPREDUCE-4651:
----------------------------------------------

    Assignee: Konstantin Shvachko
    
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463122#comment-13463122 ] 

Jakob Homan commented on MAPREDUCE-4651:
----------------------------------------

+1. Good refactoring as well.  nit: bit of wonky spacing in TestDFSIO::AppendMapper::getIOStream.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464794#comment-13464794 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651:
---------------------------------------------------

Hi Konstantin, the patch no longer applies to trunk.  Could you update it?
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463779#comment-13463779 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1177 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1177/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. (Revision 1390159)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390159
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Ravi Prakash (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455019#comment-13455019 ] 

Ravi Prakash commented on MAPREDUCE-4651:
-----------------------------------------

Hi Konstantin,

Thanks for this initiative. I like the idea of benchmarking random reads. Some comments:
1. Why not label IOMapperBase.getIOStream() abstract rather than return null?
2. Some extra whitespaces.
3. TestDFSIO:doIO, @Override //IOMapperBase
4. In doIO(), would it make sense to do?
      if( this.stream instanceof InputStream) InputStream in = (InputStream)this.stream;
   Similarly for PositionedReadable
5. public RandomReadMapper()  you can use new Random(), to seed it with a distinct seed. You don't need a call to System.nanoTime().

Oh, and could you please review MAPREDUCE-4645? =D
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463186#comment-13463186 ] 

Hadoop QA commented on MAPREDUCE-4651:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546568/randomDFSIO.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 2 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2878//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2878//console

This message is automatically generated.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465467#comment-13465467 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

That is because I committed it. Jakob reviewed.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-4651:
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.4
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

I just committed this to trunk, branch-2, and branch 0.23.4
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453821#comment-13453821 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

The idea is to utilize HDFS positional read, which is defined by {{PositionedReadable}} and allows to read a segment of data from a given position.
I propose three variants of such benchmarks:
# *Random read*. Randomly choose an offset in the range [0, fileSize] and read one buffer of data from that random position. Repeat operation until a specified number of bytes is read. 
Random read can occasionally read the same bytes twice.
# *Backward read* reads file in reverse order.
This is intended to read all bytes of the given file, but avoid reading any of them twice.
# *Skip read*. Starting from the beginning read one buffer of data, then jump ahead, and read again. Repeat until either the specified number of bytes is read or the end of file is reached.
Skip read allows to avoid read-ahead. With sequential read data mostly comes from the system block cache. Jumping ahead far enough will ensure that bytes are actually read from the storage device.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-4651:
-------------------------------------------

    Attachment: randomDFSIO.patch

Fixed wonky space in AppendMapper.getIOStream()
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465601#comment-13465601 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651:
---------------------------------------------------

Oops, I forgot to reload the page.  It is great that Jakob has reviewed it.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-4651:
-------------------------------------------

    Attachment: randomDFSIO.patch

Removed nanoTime(), added comments to @Override.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455133#comment-13455133 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

(1) I did want to make IOMapperBase.getIOStream() abstract, but there are other tests that are based on IOMapperBase. A dummy implementation of getIOStream() will avoid changing them.
(2) Do you want me to add spaces or you see unnecessary spaces in my patch?
(3) So you suggest to add comments to @Override specifying the base class, right? Agreed, should have done that for the new method, will also do it for doIO().
(4) Sure you can check the instance before casting, but what would you do in the else clause - throw exception. So one way or another an exception will be thrown saying the type of stream is not right. And this is sort of an assert, because it means there is a bug in DFSIO, not user's fault.
(5) Yep, you are right nanoTime() is already in the default constructor. I believe it wasn't there before.

Thanks for the review. I'll check SLive jira.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463154#comment-13463154 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

Ravi, 
- even with single node you can specify more -nrFiles if you have multiple drives on the node. I usually setup number of map slots equal to the number of drives on a node.
- I don't know how big was the file that you created with -write prior to reads. If it was 10 MB than the actual size of reads was not more than that. Check the DFSIO summary it prints how much data was read.
- You probably ran reads right after creating the file. So the the data was in buffer cache. I usually clean the cache before each test run. (On linux 'echo 1 > /proc/sys/vm/drop_caches')
- Also -fileSize is replaced by -size in my patch. It says how much data you want to read/write/append, rather than specifying the size of a file. Initially (read/write) it was the same.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463261#comment-13463261 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2835 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2835/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. (Revision 1390159)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390159
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463299#comment-13463299 ] 

Hudson commented on MAPREDUCE-4651:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2793 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2793/])
    MAPREDUCE-4651. Benchmarking random reads with DFSIO. (Revision 1390159)

     Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390159
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.23.4
>
>         Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira