You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/03 15:02:05 UTC

[GitHub] [hudi] 5herhom opened a new pull request, #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs.

5herhom opened a new pull request, #6031:
URL: https://github.com/apache/hudi/pull/6031

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Problem: Not all dfs throw EOFException, when seek index out of the length of file, such as chdfs. So it is not suitable to use EOFException to check whether "currentPos + blocksize - Long.BYTES" is out of the file length.
   
   Solution: When throw IOException,read file length to check whether to the end.
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1173141089

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179678212

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   * 5db3df8c924b82ca216656899a6cc037b5c51559 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813) 
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6031:
URL: https://github.com/apache/hudi/pull/6031#discussion_r913281573


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -285,6 +288,22 @@ private boolean isBlockCorrupted(int blocksize) throws IOException {
       // release-3.1.0-RC1/BufferedFSInputStream.java#L73
       inputStream.seek(currentPos);
       return true;
+    } catch (IOException e) {
+      if (logFile.getFileSize() < 0) {
+        long logFileSize = FSUtils.getFileSize(fs, logFile.getPath());
+        logFile.setFileLen(logFileSize);
+      }
+      if (endOfBlockPos > logFile.getFileSize() || endOfBlockPos < 0) {
+        LOG.info("Found corrupted block in file " + logFile + " with block size(" + blocksize + ") running past EOF");
+        // this is corrupt
+        // This seek is required because contract of seek() is different for naked DFSInputStream vs BufferedFSInputStream
+        // release-3.1.0-RC1/DFSInputStream.java#L1455
+        // release-3.1.0-RC1/BufferedFSInputStream.java#L73
+        inputStream.seek(currentPos);
+        return true;
+      } else {
+        throw e;
+      }

Review Comment:
   Instead of changing the core reader logic here, could you add the scheme-specific logic to a new implementation of `FSDataInputStream` like `SchemeAwareFSDataInputStream` and integrate that through `getFSDataInputStream()`?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -285,6 +288,22 @@ private boolean isBlockCorrupted(int blocksize) throws IOException {
       // release-3.1.0-RC1/BufferedFSInputStream.java#L73
       inputStream.seek(currentPos);
       return true;
+    } catch (IOException e) {
+      if (logFile.getFileSize() < 0) {
+        long logFileSize = FSUtils.getFileSize(fs, logFile.getPath());
+        logFile.setFileLen(logFileSize);
+      }
+      if (endOfBlockPos > logFile.getFileSize() || endOfBlockPos < 0) {
+        LOG.info("Found corrupted block in file " + logFile + " with block size(" + blocksize + ") running past EOF");
+        // this is corrupt
+        // This seek is required because contract of seek() is different for naked DFSInputStream vs BufferedFSInputStream
+        // release-3.1.0-RC1/DFSInputStream.java#L1455
+        // release-3.1.0-RC1/BufferedFSInputStream.java#L73
+        inputStream.seek(currentPos);
+        return true;
+      } else {
+        throw e;
+      }

Review Comment:
   See here:
   ```
   private static FSDataInputStream getFSDataInputStream(FileSystem fs,
                                                           HoodieLogFile logFile,
                                                           int bufferSize) throws IOException {
       FSDataInputStream fsDataInputStream = fs.open(logFile.getPath(), bufferSize);
   
       if (FSUtils.isGCSFileSystem(fs)) {
         // in GCS FS, we might need to interceptor seek offsets as we might get EOF exception
         return new SchemeAwareFSDataInputStream(getFSDataInputStreamForGCS(fsDataInputStream, logFile, bufferSize), true);
       }
   
       if (fsDataInputStream.getWrappedStream() instanceof FSInputStream) {
         return new TimedFSDataInputStream(logFile.getPath(), new FSDataInputStream(
             new BufferedFSInputStream((FSInputStream) fsDataInputStream.getWrappedStream(), bufferSize)));
       }
   
       // fsDataInputStream.getWrappedStream() maybe a BufferedFSInputStream
       // need to wrap in another BufferedFSInputStream the make bufferSize work?
       return fsDataInputStream;
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1249984154

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   * d64fd913dffc7ec0b353506f42ecd1ac215d5087 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179677532

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   * 5db3df8c924b82ca216656899a6cc037b5c51559 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] 5herhom commented on a diff in pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
5herhom commented on code in PR #6031:
URL: https://github.com/apache/hudi/pull/6031#discussion_r913369350


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -285,6 +288,22 @@ private boolean isBlockCorrupted(int blocksize) throws IOException {
       // release-3.1.0-RC1/BufferedFSInputStream.java#L73
       inputStream.seek(currentPos);
       return true;
+    } catch (IOException e) {
+      if (logFile.getFileSize() < 0) {
+        long logFileSize = FSUtils.getFileSize(fs, logFile.getPath());
+        logFile.setFileLen(logFileSize);
+      }
+      if (endOfBlockPos > logFile.getFileSize() || endOfBlockPos < 0) {
+        LOG.info("Found corrupted block in file " + logFile + " with block size(" + blocksize + ") running past EOF");
+        // this is corrupt
+        // This seek is required because contract of seek() is different for naked DFSInputStream vs BufferedFSInputStream
+        // release-3.1.0-RC1/DFSInputStream.java#L1455
+        // release-3.1.0-RC1/BufferedFSInputStream.java#L73
+        inputStream.seek(currentPos);
+        return true;
+      } else {
+        throw e;
+      }

Review Comment:
   ok. 3Q for suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179676837

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   * 5db3df8c924b82ca216656899a6cc037b5c51559 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179687173

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   * 5db3df8c924b82ca216656899a6cc037b5c51559 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813) 
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] 5herhom commented on a diff in pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
5herhom commented on code in PR #6031:
URL: https://github.com/apache/hudi/pull/6031#discussion_r917356547


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -285,6 +288,22 @@ private boolean isBlockCorrupted(int blocksize) throws IOException {
       // release-3.1.0-RC1/BufferedFSInputStream.java#L73
       inputStream.seek(currentPos);
       return true;
+    } catch (IOException e) {
+      if (logFile.getFileSize() < 0) {
+        long logFileSize = FSUtils.getFileSize(fs, logFile.getPath());
+        logFile.setFileLen(logFileSize);
+      }
+      if (endOfBlockPos > logFile.getFileSize() || endOfBlockPos < 0) {
+        LOG.info("Found corrupted block in file " + logFile + " with block size(" + blocksize + ") running past EOF");
+        // this is corrupt
+        // This seek is required because contract of seek() is different for naked DFSInputStream vs BufferedFSInputStream
+        // release-3.1.0-RC1/DFSInputStream.java#L1455
+        // release-3.1.0-RC1/BufferedFSInputStream.java#L73
+        inputStream.seek(currentPos);
+        return true;
+      } else {
+        throw e;
+      }

Review Comment:
   New commit have covered to the old according to your suggestion. Please review again. Thank you~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1249985588

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11442",
       "triggerID" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   * d64fd913dffc7ec0b353506f42ecd1ac215d5087 UNKNOWN
   * c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #6031: [HUDI-4282] Repair IOException in CHDFS when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
yihua merged PR #6031:
URL: https://github.com/apache/hudi/pull/6031


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6031:
URL: https://github.com/apache/hudi/pull/6031#discussion_r965431934


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java:
##########
@@ -632,6 +635,15 @@ public static boolean isGCSFileSystem(FileSystem fs) {
     return fs.getScheme().equals(StorageSchemes.GCS.getScheme());
   }
 
+  /**
+   * Some filesystem(such as chdfs) will throw {@code IOException} instead of {@code EOFException}. It will cause error in isBlockCorrupted().
+   * Wrapped by {@code BoundedFsDataInputStream}, to check whether the desired offset is out of the file size in advance.
+   */
+  public static boolean shouldWrappedByBoundedDataStream(FileSystem fs) {

Review Comment:
   can we keep it simple for now. 
   ```
     public static boolean isCHDSFileSystem(FileSystem fs) {
       return fs.getScheme().equals(StorageSchemes.CHDS.getScheme());
     }
   ```
   
   if at all we come across other storage schemes which might need this, we can make it a map. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179706775

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5db3df8c924b82ca216656899a6cc037b5c51559 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813) 
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1249984908

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   * d64fd913dffc7ec0b353506f42ecd1ac215d5087 UNKNOWN
   * c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1173111341

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1248759252

   @5herhom : can you follow up on the feedback. its nearing landing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] 5herhom commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
5herhom commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1249995668

   > 2 minor comments. and I am assuming you have tested the patch in your env (CHDS) and its working as expected? LGTM
   
   The code has been updated. Please review again. Thank you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6031:
URL: https://github.com/apache/hudi/pull/6031#discussion_r965431369


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -516,4 +521,23 @@ private static FSDataInputStream getFSDataInputStreamForGCS(FSDataInputStream fs
 
     return fsDataInputStream;
   }
+
+  /**
+   * Some filesystem(such as chdfs) will throw {@code IOException} instead of {@code EOFException}. It will cause error in isBlockCorrupted().
+   * Wrapped by {@code BoundedFsDataInputStream}, to check whether the desired offset is out of the file size in advance.
+   */
+  private static FSDataInputStream wrapStreamByBoundedFsDataInputStream(FileSystem fs,

Review Comment:
   if we call this method in Line 490 above, we don't need lines 533 to 539 right. 
   essentially
   line 493 could be
   ```
   return FSUtils.shouldWrappedByBoundedDataStream(fs) : new BoundedFsDataInputStream(fs, logFile.getPath(), fsDataInputStream): fsDataInputStream; 
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1179708646

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f3a3f4b9e27b5daaeca40070626a80c7e80bd479 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1173112020

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1250043426

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9692",
       "triggerID" : "4e45938ed9726a1eb51a86f8f8cadfd9f4a94c62",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9813",
       "triggerID" : "5db3df8c924b82ca216656899a6cc037b5c51559",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9814",
       "triggerID" : "f3a3f4b9e27b5daaeca40070626a80c7e80bd479",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d64fd913dffc7ec0b353506f42ecd1ac215d5087",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11442",
       "triggerID" : "c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d64fd913dffc7ec0b353506f42ecd1ac215d5087 UNKNOWN
   * c5c62b6df1884245bd2ed5cae1c5600f6d9dcfa3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] 5herhom commented on pull request #6031: [HUDI-4282] Repair IOException in some other dfs, except hdfs,when check block corrupted in HoodieLogFileReader

Posted by GitBox <gi...@apache.org>.
5herhom commented on PR #6031:
URL: https://github.com/apache/hudi/pull/6031#issuecomment-1248796155

   > @5herhom : can you follow up on the feedback. its nearing landing.
   
   Sorry, I'm busy these days. I will commit in two days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org