You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/09/07 19:05:38 UTC

[GitHub] [hadoop] steveloughran commented on a diff in pull request #4862: HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled.

steveloughran commented on code in PR #4862:
URL: https://github.com/apache/hadoop/pull/4862#discussion_r965174231


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);

Review Comment:
   wrap in () because % isn't used enough for the precedence to be immediately obvious



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -396,11 +407,33 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       return data;
     }
 
+    /**
+     * Validates range parameters.
+     * In case of CheckSum FS, we already have calculated
+     * fileLength so failing fast here.
+     * @param ranges requested ranges.
+     * @param fileLen length of file.
+     * @throws EOFException end of file exception.
+     */
+    private void validateRangeRequest(List<? extends FileRange> ranges, long fileLen) throws EOFException {
+      for (FileRange range : ranges) {
+        VectoredReadUtils.validateRangeRequest(range);
+        if (range.getOffset() + range.getLength() > fileLen) {
+          LOG.warn("Requested range [{}, {}) is beyond EOF for path {}",
+                  range.getOffset(), range.getLength(), file);
+          throw new EOFException("Requested range [" + range.getOffset() + ", "
+                  + range.getLength() + ") is beyond EOF for path " + file);
+        }
+      }
+    }
+
     @Override
     public void readVectored(List<? extends FileRange> ranges,
                              IntFunction<ByteBuffer> allocate) throws IOException {
+      long length = fs.getFileStatus(file).getLen();

Review Comment:
   1. use `getFileLength()` which does the on demand read
   2. move that code to use getFileStatus() as the way it does it today makes no sense. excessively inefficient.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);
+      int totalChunks = numFullChunks;
+      if (partialChunk) {
+        totalChunks++;
+      }
       CRC32 crc = new CRC32();
       // check each chunk to ensure they match
-      for(int c = 0; c < numChunks; ++c) {
+      for(int c = 0; c < totalChunks; ++c) {
         // set the buffer position and the limit
-        current.limit((c + 1) * bytesPerSum);
+        if (c == numFullChunks) {
+          int lastIncompleteChunk = data.remaining() % bytesPerSum;

Review Comment:
   add comment to explain what is happening



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org