You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/14 21:53:46 UTC

[GitHub] [incubator-hudi] afilipchik opened a new pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

afilipchik opened a new pull request #1516: [HUDI-784] Adressing issue with log reader on GCS
URL: https://github.com/apache/incubator-hudi/pull/1516
 
 
   ## What is the purpose of the pull request
   
   This one was hard to reproduce and debug, but proposed fix gets rid of the error. Still not sure whether it is issue with GCS library or static array. 
   
   ## Brief change log
   
     - Modifed HoodieLogFileReader.
   
   ## Verify this pull request
   Not sure what the test should be...
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r416935621



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -79,6 +79,11 @@
       this.inputStream = fsDataInputStream;
     }
 
+    // Defensive measure to make sure nothing advanced the stream.
+    if (fsDataInputStream.getPos() != 0) {

Review comment:
       @afilipchik : Can we revert this change then and try as it is likely the issue with static vs non-static ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r419850319



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -79,6 +79,11 @@
       this.inputStream = fsDataInputStream;
     }
 
+    // Defensive measure to make sure nothing advanced the stream.
+    if (fsDataInputStream.getPos() != 0) {

Review comment:
       oks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411560192



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -79,6 +79,7 @@
       this.inputStream = fsDataInputStream;
     }
 
+    fsDataInputStream.seek(0);

Review comment:
       magicBuffer check was failing on file open, as it couldn't find a beginning of the HUDI block in the log file. 
   That was throwing exception that was killing compaction. In the debug, it appeared that content of magicBuffer was incorrect, and steam offsets were off. Only happened when more that 1 files was scheduled to be processed. So, I didn't test all the variations originally (non static magicBuffer without seek and seek with static) as was in hurry to fix, so not sure which one actually fixes the issue. Agree that seek(0) is weird and probably non static is a fix. Added comment and additional check..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411494903



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -59,7 +59,7 @@
 
   private final FSDataInputStream inputStream;
   private final HoodieLogFile logFile;
-  private static final byte[] MAGIC_BUFFER = new byte[6];
+  private final byte[] magicBuffer = new byte[6];

Review comment:
       makes sense to be non-static.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -79,6 +79,7 @@
       this.inputStream = fsDataInputStream;
     }
 
+    fsDataInputStream.seek(0);

Review comment:
       What is the original exception were you seeing ? 
   
   This seek() call is immediately after open() so should be innocuous for other file-systems. right ?
   
   It would be better to inline document the reasoning behind this change for future understanding.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1516:
URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411560192



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -79,6 +79,7 @@
       this.inputStream = fsDataInputStream;
     }
 
+    fsDataInputStream.seek(0);

Review comment:
       magicBuffer check was failing on file open, as it couldn't find a beginning of the HUDI block in the log file. 
   That was throwing exception that was killing compaction. In the debug, it appeared that content of magicBuffer was incorrect, and steam offsets we off. Only happened when more that 1 files was scheduled to be processed. So, I didn't test all the variations originally (non static magicBuffer without seek and seek with static) as was in hurry to fix, so not sure which one actually fixes the issue. Agree that seek(0) is weird and probably non static is a fix. Added comment and additional check..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org