You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/04/11 09:28:37 UTC

[GitHub] [beam] lukemin89 commented on a change in pull request #11397: [BEAM-9743] Fix TFRecordCodec to try harder to read/write

lukemin89 commented on a change in pull request #11397: [BEAM-9743] Fix TFRecordCodec to try harder to read/write
URL: https://github.com/apache/beam/pull/11397#discussion_r407041130
 
 

 ##########
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java
 ##########
 @@ -669,34 +670,31 @@ public int recordLength(byte[] data) {
 
     public @Nullable byte[] read(ReadableByteChannel inChannel) throws IOException {
       header.clear();
-      int headerBytes = inChannel.read(header);
-      if (headerBytes <= 0) {
+      int firstRead = read(inChannel, header);
+      if (firstRead == 0) {
         return null;
       }
-      checkState(headerBytes == HEADER_LEN, "Not a valid TFRecord. Fewer than 12 bytes.");
 
       header.rewind();
-      long length = header.getLong();
-      long lengthHash = hashLong(length);
+      long length64 = header.getLong();
+      long lengthHash = hashLong(length64);
       int maskedCrc32OfLength = header.getInt();
       if (lengthHash != maskedCrc32OfLength) {
         throw new IOException(
             String.format(
                 "Mismatch of length mask when reading a record. Expected %d but received %d.",
                 maskedCrc32OfLength, lengthHash));
       }
-
-      ByteBuffer data = ByteBuffer.allocate((int) length);
-      while (data.hasRemaining() && inChannel.read(data) >= 0) {}
-      if (data.hasRemaining()) {
-        throw new IOException(
-            String.format(
-                "EOF while reading record of length %d. Read only %d bytes. Input might be truncated.",
-                length, data.position()));
+      int length = (int) length64;
+      if (length != length64) {
+        throw new IOException(String.format("length overflow %d", length64));
       }
 
 Review comment:
   minor, yet required check for integer overflow. I'm not sure if this is viable, but if I remember correctly, other languages can make/write more than 2GB. (Maybe java can do it with a direct buffer?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services