You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/04/17 22:02:00 UTC
[jira] [Resolved] (BEAM-9743) TFRecordCodec not attempt to fully
read/write
[ https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik resolved BEAM-9743.
-----------------------------
Fix Version/s: 2.22.0
Resolution: Fixed
> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
> Key: BEAM-9743
> URL: https://issues.apache.org/jira/browse/BEAM-9743
> Project: Beam
> Issue Type: Bug
> Components: io-java-tfrecord, sdk-java-core
> Reporter: Kyoungha Min
> Assignee: Kyoungha Min
> Priority: Critical
> Fix For: 2.22.0
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> The same issue has been pointed out and the issues were marked resolved. But they were still remaining parts....
> https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22
>
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is likely to fail around the end of channel buffer.
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it writes.
>
> Seems like it only happens with Zstd compression (or any other picky input stream that refuse to read fully). ZstdInputStream seems very picky at giving out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>
> And not so problem within the beam application (As all (or most) of WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), but still not following the WritableByteChannel specification,
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not required to read/write fully, and can refuse to read/write time to time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)