You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "mingleizhang (JIRA)" <ji...@apache.org> on 2016/07/06 09:16:11 UTC

[jira] [Created] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

mingleizhang created MAPREDUCE-6729:
---------------------------------------

             Summary: Hitting performance and error when lots of files to write or read
                 Key: MAPREDUCE-6729
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: benchmarks, performance, test
            Reporter: mingleizhang
            Priority: Minor


When doing DFSIO test as distributed i/o benchmark tool. Then especially writes plenty of files to disk or read from, both can cause performance issue and imprecise value in a way. The question is that existing practices needs to delete files when before running a job and that will cause time consumption and furthermore cause performance issue, statistical time error and imprecise throughput for us. We need to replace or improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
    FileSystem fs = cluster.getFileSystem();
    long tStart = System.currentTimeMillis();
    bench.writeTest(fs);
    long execTime = System.currentTimeMillis() - tStart;
    bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);    
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org