You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "leesf (Jira)" <ji...@apache.org> on 2019/09/04 03:26:00 UTC

[jira] [Commented] (HUDI-140) GCS: Log File Reading not working due to difference in seek() behavior for EOF

    [ https://issues.apache.org/jira/browse/HUDI-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921920#comment-16921920 ] 

leesf commented on HUDI-140:
----------------------------

Fixed via master: 0b451b3a58cabe25c0cecd3fd8847a8597e2313a

> GCS: Log File Reading not working due to difference in seek() behavior for EOF
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-140
>                 URL: https://issues.apache.org/jira/browse/HUDI-140
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Realtime View
>            Reporter: BALAJI VARADARAJAN
>            Assignee: BALAJI VARADARAJAN
>            Priority: Major
>              Labels: gcs-parity, pull-request-available, usability
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Issue:
> Caused by: java.io.EOFException: Invalid seek offset: position value (1370518) must be between 0 and 1370518
>     at com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.validatePosition(GoogleCloudStorageReadChannel.java:644)
>     at com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.position(GoogleCloudStorageReadChannel.java:558)
>     at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.seek(GoogleHadoopFSInputStream.java:309)
>     at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
>     at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>     at com.uber.hoodie.common.table.log.block.HoodieLogBlock.readOrSkipContent(HoodieLogBlock.java:234)
>     at com.uber.hoodie.common.table.log.HoodieLogFileReader.createCorruptBlock(HoodieLogFileReader.java:230)
>     at com.uber.hoodie.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:149)
>     at com.uber.hoodie.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:352)
>  
> “””
> _Status_: The issue turned  out to be because of difference in GCSHadoopFileSystem's  and HDFSFileSystem's implementation of Inputstream.seek behavior for handling EOF. This is causing log block reading for GCS to treat a valid last block as corrupt. Given a quick fix to Alex to try it out. Needs discussion with Hudi dev to figure out a proper solution



--
This message was sent by Atlassian Jira
(v8.3.2#803003)