You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/01 05:42:52 UTC

[GitHub] [hudi] garyli1019 edited a comment on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

garyli1019 edited a comment on issue #1890:
URL: https://github.com/apache/hudi/issues/1890#issuecomment-667465216


   This issue happened to me again. Now the cause could be narrowed down.
   When the log file was larger than `HoodieStorageConfig.LOGFILE_SIZE_MAX_BYTES`(1GB in default), the log file will be split into two files and the total size of the two log files is larger than 2GB. When loading these two splits, this issue happened. 
   ~~My guess was the serializer has been reset after loading the first file~~ Created a ticket to track this https://issues.apache.org/jira/browse/HUDI-1141
   EDIT: Looks like this could be an integer overflow issue. https://github.com/apache/hudi/blame/master/hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java#L354
   `Integer.MAXVALUE` is ~2GB. The file size fields in the relevant classes are all `Integer`.
   In my test, some log groups are larger than 2GB, so the smaller log file group is fine but large ones were failing. 
   @bvaradar What do you think? Should we fix this or we should just avoid having such a large log? 
   This kind of large log file is unusual because I was just stress-testing the merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org