You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "leesf (Jira)" <ji...@apache.org> on 2019/09/04 03:26:00 UTC
[jira] [Commented] (HUDI-140) GCS: Log File Reading not working due
to difference in seek() behavior for EOF
[ https://issues.apache.org/jira/browse/HUDI-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921920#comment-16921920 ]
leesf commented on HUDI-140:
----------------------------
Fixed via master: 0b451b3a58cabe25c0cecd3fd8847a8597e2313a
> GCS: Log File Reading not working due to difference in seek() behavior for EOF
> ------------------------------------------------------------------------------
>
> Key: HUDI-140
> URL: https://issues.apache.org/jira/browse/HUDI-140
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Components: Realtime View
> Reporter: BALAJI VARADARAJAN
> Assignee: BALAJI VARADARAJAN
> Priority: Major
> Labels: gcs-parity, pull-request-available, usability
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Issue:
> Caused by: java.io.EOFException: Invalid seek offset: position value (1370518) must be between 0 and 1370518
> at com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.validatePosition(GoogleCloudStorageReadChannel.java:644)
> at com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.position(GoogleCloudStorageReadChannel.java:558)
> at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.seek(GoogleHadoopFSInputStream.java:309)
> at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
> at com.uber.hoodie.common.table.log.block.HoodieLogBlock.readOrSkipContent(HoodieLogBlock.java:234)
> at com.uber.hoodie.common.table.log.HoodieLogFileReader.createCorruptBlock(HoodieLogFileReader.java:230)
> at com.uber.hoodie.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:149)
> at com.uber.hoodie.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:352)
>
> “””
> _Status_: The issue turned out to be because of difference in GCSHadoopFileSystem's and HDFSFileSystem's implementation of Inputstream.seek behavior for handling EOF. This is causing log block reading for GCS to treat a valid last block as corrupt. Given a quick fix to Alex to try it out. Needs discussion with Hudi dev to figure out a proper solution
--
This message was sent by Atlassian Jira
(v8.3.2#803003)