You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2010/08/20 11:29:17 UTC

[jira] Commented: (MAPREDUCE-2023) TestDFSIO read test may not read specified bytes.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900652#action_12900652 ] 

Hong Tang commented on MAPREDUCE-2023:
--------------------------------------

The problem is due to the following code segments:
{code}
  public static class ReadMapper extends IOStatMapper<Long> {

    public ReadMapper() { 
    }

    public Long doIO(Reporter reporter, 
                       String name, 
                       long totalSize // in bytes
                     ) throws IOException {
      // open file
      DataInputStream in = fs.open(new Path(getDataDir(getConf()), name));
      long actualSize = 0;
      try {
        for(int curSize = bufferSize;
                curSize == bufferSize && actualSize < totalSize;) { // <-- HERE
          curSize = in.read(buffer, 0, bufferSize);
          if(curSize < 0) break;
          actualSize += curSize;
          reporter.setStatus("reading " + name + "@" + 
                             actualSize + "/" + totalSize 
                             + " ::host = " + hostName);
        }
      } finally {
        in.close();
      }
      return Long.valueOf(actualSize);
    }
  }
{code}

The problem is that the for-loop breaks out as soon as the previous read fails to fulfill the full buffer. The fix is pretty simple:
{code}
        for(int curSize = bufferSize; actualSize < totalSize;) {
{code}

> TestDFSIO read test may not read specified bytes.
> -------------------------------------------------
>
>                 Key: MAPREDUCE-2023
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2023
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks
>            Reporter: Hong Tang
>
> TestDFSIO's read test may read less bytes than specified when reading large files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.