You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/11/14 17:41:51 UTC

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

theosib-amazon commented on code in PR #960:
URL: https://github.com/apache/parquet-mr/pull/960#discussion_r1021870509


##########
parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java:
##########
@@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) {
   }
 
   @Override
-  public int read(byte[] bytes) {
-    return read(bytes, 0, bytes.length);
+  public void readFully(byte[] bytes, int off, int len) throws IOException {
+    if (len <= 0) {
+      if (len < 0) {
+        throw new IndexOutOfBoundsException("Read length must be greater than 0: " + len);
+      }
+      
+      return;
+    }
+
+    if (current == null || len > length) {
+      throw new EOFException();
+    }
+
+    int bytesRead = 0;
+    while (bytesRead < len) {

Review Comment:
   I did test a lot of tradeoffs, but I don't think I tested this one thing directly. It's also been quite a while since I did this, so I don't think I'd be able to figure out which spreadsheets have the relevant data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org