You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/04/17 22:02:00 UTC

[jira] [Resolved] (BEAM-9743) TFRecordCodec not attempt to fully read/write

     [ https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Cwik resolved BEAM-9743.
-----------------------------
    Fix Version/s: 2.22.0
       Resolution: Fixed

> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
>                 Key: BEAM-9743
>                 URL: https://issues.apache.org/jira/browse/BEAM-9743
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-tfrecord, sdk-java-core
>            Reporter: Kyoungha Min
>            Assignee: Kyoungha Min
>            Priority: Critical
>             Fix For: 2.22.0
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The same issue has been pointed out and the issues were marked resolved. But they were still remaining parts....
> https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22
>  
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is likely to fail around the end of channel buffer.  
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it writes. 
>  
> Seems like it only happens with Zstd compression (or any other picky input stream that refuse to read fully). ZstdInputStream seems very picky at giving out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>  
> And not so problem within the beam application (As all (or most) of WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), but still not following the WritableByteChannel specification, 
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>  
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not required to read/write fully, and can refuse to read/write time to time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)