You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/02 09:54:33 UTC

[GitHub] [arrow] liyafan82 opened a new pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

liyafan82 opened a new pull request #7326:
URL: https://github.com/apache/arrow/pull/7326


   This is the first sub-work item of ARROW-8672 (
   [Java] Implement RecordBatch IPC buffer compression from ARROW-300). However, it does not involve any concrete compression algorithms. The purpose of this PR is to establish basic interfaces for data compression, and make changes to the IPC framework so that different compression algorithms can be plug-in smoothly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-640584596


   Thanks a lot for your good comments. Please see my reply in line. 
   
   > Looks good, few minor comments. It would be good to see this in a test (if that is reasonable/possible) w/ a NoOp compressor or something as its hard for me to envisoin how the compression actually will function.
   
   Sounds good. In fact, I have added some tests for NoOp compressor when preparing the PR. However, the changes are reverted before submitting the PR, because it would involve changes to the Message.fbs file. IMO, we should be careful when changing this file, as it involves protocol changes, and may break the code for other languages. 
   
   > 
   > Also, excuse my ignorance of the compression scheme but is there any negotiation of the compresson scheme or an opt-out for older clients? A client recieving a compressed buffer without taking it into account will likely crash hard.
   
   I agree with you that compression scheme negotiation is a useful and widely used feature. Maybe we will support it in the futrue, but not this this PR. According to the discussion in the ML, this issue only deals with some standard compression schemes, and these schemes are encoded in the protocol and should be supported universally. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #7326:
URL: https://github.com/apache/arrow/pull/7326


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464024560



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param input the buffer to compress.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(ArrowBuf input);

Review comment:
       Should allocator be part of this interface?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r436654884



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {
+
+  private CompressionUtility() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    if (codec == null) {
+      return null;
+    }
+    for (int i = 0; i < CompressionType.names.length; i++) {
+      if (CompressionType.names[i].equals(codec.getCodecName())) {

Review comment:
       Thanks for your comment. I agree with you. 
   The problem is that class `org.apache.arrow.flatbuf.CompressionType` is automatically generated by flatbuf, and it is not implemented by an enum. Instead, it has separate fields for ordinals and names. In addition, it only provides function to convert ordinals to names, but not vice versa. 

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Given a buffer, estimate the compressed size.
+   * Please note this operation is optional, and some compression methods may not support it.
+   *
+   * @param input the input buffer to be estimated.
+   * @return the estimated size of the compressed data.
+   */
+  long estimateCompressedSize(ArrowBuf input);

Review comment:
       Simply compress/decompress would be OK, of course. 
   
   The estimateCompressedSize/estimateDecompressedSize APIs are mainly provided for performance benefits. In particular, if we can estimate the sizes with accuracy, we will
   1. Avoid allocating memory spaces larger than necessary.
   2. Avoid reallocating memory space during compression/decompression
   3. Avoid data copy caused by reallocations. 
   
   Due to the above benefits, I would prefer to add the APIs to the interface, but I am open to others' opinons and willing to change my mind if necessary. 

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -300,6 +307,20 @@ public static long writeBatchBuffers(WriteChannel out, ArrowRecordBatch batch) t
     return out.getCurrentPosition() - bufferStart;
   }
 
+  /**
+   * Serialize the compression body.
+   */
+  public static long writeCompressionBody(
+      WriteChannel out, ArrowBodyCompression bodyCompression) throws IOException {
+    long bufferStart = out.getCurrentPosition();
+    ByteBuffer buf = ByteBuffer.allocate(2);

Review comment:
       I have revised the implementation of `ArrowBodyCompression ` so that it is based on byte[]. Thanks a lot for the good suggestion. 

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Given a buffer, estimate the compressed size.
+   * Please note this operation is optional, and some compression methods may not support it.
+   *
+   * @param input the input buffer to be estimated.
+   * @return the estimated size of the compressed data.
+   */
+  long estimateCompressedSize(ArrowBuf input);

Review comment:
       Maybe you mean we should remove the estimateCompressedSize/estimateDecompressedSize APIs from the CompressionCodec interface, so that the sizes will be estimated internally when calling the compress/decompress APIs. 
   
   The problem is that for some scenarios, the size should be enforced from the outside, rather than being computed internally. For example, in the decompression phase, if we know the buffer type (e.g. validity buffer) and the vector length, we can estimate the decompressed buffer size with absolute certainty. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r447441667



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {
+
+  private CompressionUtility() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    if (codec == null) {
+      return null;
+    }
+    for (int i = 0; i < CompressionType.names.length; i++) {
+      if (CompressionType.names[i].equals(codec.getCodecName())) {

Review comment:
       Thanks for the suggestion. I have revised the code to implement it through a switch statement. Please check if it looks good.
   
   It looks clearer. However, we may need to change the code whenever we support a new compression method. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465622358



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowBodyCompression.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+import org.apache.arrow.flatbuf.BodyCompression;
+
+import com.google.flatbuffers.FlatBufferBuilder;
+
+/**
+ * Compression information about data written to a channel.
+ */
+public class ArrowBodyCompression implements FBSerializable {
+
+  /**
+   * Length of the serialized object.
+   */
+  public static final long BODY_COMPRESSION_LENGTH = 2L;
+
+  final byte[] data = new byte[(int) BODY_COMPRESSION_LENGTH];

Review comment:
       We have restored to separate named variables in the revised PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025134



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -266,8 +266,15 @@ public static ArrowBlock serialize(WriteChannel out, ArrowRecordBatch batch, Ipc
     long bufferLength = writeBatchBuffers(out, batch);
     Preconditions.checkArgument(bufferLength % 8 == 0, "out is not aligned");
 
+    long compLength = 0L;
+    if (batch.getBodyCompression() != null) {
+      compLength = writeCompressionBody(out, batch.getBodyCompression());
+      Preconditions.checkArgument(compLength == ArrowBodyCompression.BODY_COMPRESSION_LENGTH,

Review comment:
       Could you add a comment here on why we need to verify this value here?.  Also, it looks like align() is called in this method, so couldn't that cause these values to be unequal

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -266,8 +266,15 @@ public static ArrowBlock serialize(WriteChannel out, ArrowRecordBatch batch, Ipc
     long bufferLength = writeBatchBuffers(out, batch);
     Preconditions.checkArgument(bufferLength % 8 == 0, "out is not aligned");
 
+    long compLength = 0L;
+    if (batch.getBodyCompression() != null) {
+      compLength = writeCompressionBody(out, batch.getBodyCompression());
+      Preconditions.checkArgument(compLength == ArrowBodyCompression.BODY_COMPRESSION_LENGTH,
+          "deserialized compression body length not equal to ArrowBodyCompression#BODY_COMPRESSION_LENGTH");

Review comment:
       included sizes here would be useful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025676



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param input the buffer to compress.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(ArrowBuf input);

Review comment:
       would uncompressedBuffer be a better name for the parameter?  Similar question for the one below?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465621351



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -403,16 +421,29 @@ public static ArrowRecordBatch deserializeRecordBatch(RecordBatch recordBatchFB,
       nodes.add(new ArrowFieldNode(node.length(), node.nullCount()));
     }
     List<ArrowBuf> buffers = new ArrayList<>();
+    long curOffset = 0L;
     for (int i = 0; i < recordBatchFB.buffersLength(); ++i) {
       Buffer bufferFB = recordBatchFB.buffers(i);
       ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
+      curOffset = bufferFB.offset() + bufferFB.length();
       buffers.add(vectorBuffer);
     }
+
+    if (curOffset % 8 != 0) {
+      curOffset += 8 - curOffset % 8;
+    }
+    ArrowBodyCompression bodyCompression = null;

Review comment:
       In the revised PR, we no longer parse the buffer stream.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r492506544



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       @emkornfield Do you think we can merge this now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r447442441



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Given a buffer, estimate the compressed size.
+   * Please note this operation is optional, and some compression methods may not support it.
+   *
+   * @param input the input buffer to be estimated.
+   * @return the estimated size of the compressed data.
+   */
+  long estimateCompressedSize(ArrowBuf input);

Review comment:
       Sounds reasonable. Thank you.
   I have revised the code to simplify the interface. Maybe we can add the optimization in the future, if we belive it is necessary. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r435803219



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Given a buffer, estimate the compressed size.
+   * Please note this operation is optional, and some compression methods may not support it.
+   *
+   * @param input the input buffer to be estimated.
+   * @return the estimated size of the compressed data.
+   */
+  long estimateCompressedSize(ArrowBuf input);

Review comment:
       In the vector loader/unloader the estimated size is only needed by the compress/decompress methods. Is there a further use case where this would need to be externalised? Would simply compress/decompress be ok?

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {
+
+  private CompressionUtility() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    if (codec == null) {
+      return null;
+    }
+    for (int i = 0; i < CompressionType.names.length; i++) {
+      if (CompressionType.names[i].equals(codec.getCodecName())) {

Review comment:
       I feel like a switch statement and/or enum (similar to Type/ArrowType conversions would be better here than a tight loop of string comparisons. Not a huge deal as its a small array but I think its a bit easier to read.

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -300,6 +307,20 @@ public static long writeBatchBuffers(WriteChannel out, ArrowRecordBatch batch) t
     return out.getCurrentPosition() - bufferStart;
   }
 
+  /**
+   * Serialize the compression body.
+   */
+  public static long writeCompressionBody(
+      WriteChannel out, ArrowBodyCompression bodyCompression) throws IOException {
+    long bufferStart = out.getCurrentPosition();
+    ByteBuffer buf = ByteBuffer.allocate(2);

Review comment:
       I wonder if rather than allocating a new ByteBuffer it might be easier to just use a byte[] and the `write(byte[])` method. Potentially you could then use a byte[] to store the data in `ArrowBodyCompression` so that you don't have to allocate anything while writing the message.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r479017596



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowRecordBatch.java
##########
@@ -194,12 +190,17 @@ public int writeTo(FlatBufferBuilder builder) {
     int nodesOffset = FBSerializables.writeAllStructsToVector(builder, nodes);
     RecordBatch.startBuffersVector(builder, buffers.size());
     int buffersOffset = FBSerializables.writeAllStructsToVector(builder, buffersLayout);
-    int compressOffset = bodyCompression.writeTo(builder);
+    int compressOffset = 0;
+    if (bodyCompression != null) {

Review comment:
       @emkornfield Thanks for your good idea. I have updated the code accordingly. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025839



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowBodyCompression.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+import org.apache.arrow.flatbuf.BodyCompression;
+
+import com.google.flatbuffers.FlatBufferBuilder;
+
+/**
+ * Compression information about data written to a channel.
+ */
+public class ArrowBodyCompression implements FBSerializable {
+
+  /**
+   * Length of the serialized object.
+   */
+  public static final long BODY_COMPRESSION_LENGTH = 2L;
+
+  final byte[] data = new byte[(int) BODY_COMPRESSION_LENGTH];

Review comment:
       Is there a reason to not have separately named variables here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-674618715


   > > The conda integration test is failing, because the default no compression option is not supported by the specification. Maybe we need to start a discussion in the ML.
   > 
   > I think we should handle this in the implementation. If default codec is pass through the writer (or at the appropriate level), we shouldn't populate the flatbuffer with it.
   
   @emkornfield Sounds reasonable. Thanks for your feedback. I will revise the PR accordingly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468987274



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java
##########
@@ -76,6 +97,10 @@ private void appendNodes(FieldVector vector, List<ArrowFieldNode> nodes, List<Ar
           "wrong number of buffers for field %s in vector %s. found: %s",
           vector.getField(), vector.getClass().getSimpleName(), fieldBuffers));
     }
+
+    fieldBuffers = fieldBuffers.stream().map(

Review comment:
       Agreed. I have changed it to a for loop. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-669111978


   > At the very least I think this needs more comments as to why it appears that manual serializion of flatbuffer data is being used.
   > 
   > It would also be nice to avoid introducing (if (x == null)) checks if at all possible, instead having default implementations would be better (unless there is a real performance reason for doing this). Avoid writing out compression settings when there are none seems like it could handled in the reverse direction (by checking for things that are non compressed and avoiding writing it out.
   
   @emkornfield Thanks a lot for your valuable feedback.
   As you have indicated, many of the problems are caused by 1) manual serilization and 2) no default compression codec. 
   I have revised the PR to the fix above problems, so some other problems also go away.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r440909024



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {
+
+  private CompressionUtility() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    if (codec == null) {
+      return null;
+    }
+    for (int i = 0; i < CompressionType.names.length; i++) {
+      if (CompressionType.names[i].equals(codec.getCodecName())) {

Review comment:
       Agreed, I thought a pattern similar to `Types.java` (and an added enum) or a switch statement on the byte value might be more clear. Not a blocker for me though




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464024951



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowRecordBatch.java
##########
@@ -232,6 +252,12 @@ public long computeBodyLength() {
       // round up size to the next multiple of 8
       size = DataSizeRoundingUtil.roundUpTo8Multiple(size);
     }
+
+    if (bodyCompression != null) {

Review comment:
       Did you consider making bodyCompression never null (have a static instance that represents zero compression) and making a instance method on it like 'updateMetadataSize(long size)` instead of putting this check here?  Also I'm not familiar with this code but its strange to have to perform this calculation outside of serializing flatbuffers?  Is there something analogous in the C++ code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r440915137



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Given a buffer, estimate the compressed size.
+   * Please note this operation is optional, and some compression methods may not support it.
+   *
+   * @param input the input buffer to be estimated.
+   * @return the estimated size of the compressed data.
+   */
+  long estimateCompressedSize(ArrowBuf input);

Review comment:
       Yeah my thinking was that estimation could be done internally. I am not sure where the estimated value is useful externally. I *think* that the compression codec will be allocating its own buffers so should know exact/estimated sizes and be able to enforce/validate those numbers. In what scenario does the external world need to be informed of sizes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-673213128


   The conda integration test is failing, because the default no compression option is not supported by the specification. Maybe we need to start a discussion in the ML. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025855



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowBodyCompression.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+import org.apache.arrow.flatbuf.BodyCompression;
+
+import com.google.flatbuffers.FlatBufferBuilder;
+
+/**
+ * Compression information about data written to a channel.
+ */
+public class ArrowBodyCompression implements FBSerializable {
+
+  /**
+   * Length of the serialized object.
+   */
+  public static final long BODY_COMPRESSION_LENGTH = 2L;

Review comment:
       this seems like it would break if a field as ever added to the table?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r479868414



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       In thinking through this interface, something occurred to me.  Given the way the current Message serialization code works is it forces a copy from an underlying channel.  Given this, it seem like we would now have two copy like operations:
   1.  Copy from channel to compressed buffer
   2.  Uncompress the buffer when loading VectorSchemaRoot.
   
   I was wondering what you thought about deferring compression until writing out the  batch and doing decompression eagerly when reading in the batch to avoid one of these copies.  I think one thing that would potentially help us decide on the course to take would be look at the possible libraries we would use for compression/decompression and whether they support off-heap, InputStream or some other interface for accomplishing the compression
   
   @liyafan82 have you done more research in this area?
   
   @rymurr do you have anything thoughts on this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465618946



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowRecordBatch.java
##########
@@ -232,6 +252,12 @@ public long computeBodyLength() {
       // round up size to the next multiple of 8
       size = DataSizeRoundingUtil.roundUpTo8Multiple(size);
     }
+
+    if (bodyCompression != null) {

Review comment:
       Sounds reasonable. @rymurr had a similar suggestion.
   
   To make the bodyCompression non-null, we need a default compression codec that makes no compression/decompression. However, our specification does not support such an option now (Please see https://github.com/apache/arrow/blob/master/format/Message.fbs#L45-L53). Providing one would make the implementation not aligning with the specification.
   
   However, we have revised the PR to provide one, because it seems the C++ implementation is not aligning with the specification either (pls see https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.h#L33)
   
   After that, we can guarantee that the bodyCompression is never null, and some other problems also go away. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465621351



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -403,16 +421,29 @@ public static ArrowRecordBatch deserializeRecordBatch(RecordBatch recordBatchFB,
       nodes.add(new ArrowFieldNode(node.length(), node.nullCount()));
     }
     List<ArrowBuf> buffers = new ArrayList<>();
+    long curOffset = 0L;
     for (int i = 0; i < recordBatchFB.buffersLength(); ++i) {
       Buffer bufferFB = recordBatchFB.buffers(i);
       ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
+      curOffset = bufferFB.offset() + bufferFB.length();
       buffers.add(vectorBuffer);
     }
+
+    if (curOffset % 8 != 0) {
+      curOffset += 8 - curOffset % 8;
+    }
+    ArrowBodyCompression bodyCompression = null;

Review comment:
       In the revised PR, we no longer parse the buffer stream. Thanks for the suggestion. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-679458247


   > > The conda integration test is failing, because the default no compression option is not supported by the specification. Maybe we need to start a discussion in the ML.
   > 
   > I think we should handle this in the implementation. If default codec is pass through the writer (or at the appropriate level), we shouldn't populate the flatbuffer with it.
   
   @emkornfield Thanks a lot for your feedback. I have revised the code accordingly. Sorry for my late response. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465613604



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param input the buffer to compress.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(ArrowBuf input);

Review comment:
       Sounds reasonable. I have revised the code to add allocator to the interface. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468987159



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtil.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtil {
+
+  private CompressionUtil() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   * The implementation of this method should depend on the values of {@link CompressionType#names}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    switch (codec.getCodecName()) {
+      case "default":
+        return DefaultCompressionCodec.DEFAULT_BODY_COMPRESSION;
+      case "LZ4_FRAME":
+        return new ArrowBodyCompression((byte) 0, BodyCompressionMethod.BUFFER);

Review comment:
       Changed to CompressionType.LZ4_FRAME and CompressionType.ZSTD, respectively. Thanks for your kind reminder. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986440



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {
+
+  public static final DefaultCompressionCodec INSTANCE = new DefaultCompressionCodec();
+
+  public static final byte COMPRESSION_TYPE = -1;
+
+  public static final ArrowBodyCompression DEFAULT_BODY_COMPRESSION =
+      new ArrowBodyCompression(COMPRESSION_TYPE, BodyCompressionMethod.BUFFER);
+
+  private DefaultCompressionCodec() {
+  }
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer) {

Review comment:
       I agree with your point. This makes it easier to incorporate the codec into the framework. 
   Added comments in the JavaDoc. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-637434482


   https://issues.apache.org/jira/browse/ARROW-9010


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-680574395


   thanks @liyafan82 I added a comment I think the change puts back more places then necessary to check for null?  I might be missing something though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r492506544



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       @emkornfield Do you think we can merge this now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465619212



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -266,8 +266,15 @@ public static ArrowBlock serialize(WriteChannel out, ArrowRecordBatch batch, Ipc
     long bufferLength = writeBatchBuffers(out, batch);
     Preconditions.checkArgument(bufferLength % 8 == 0, "out is not aligned");
 
+    long compLength = 0L;
+    if (batch.getBodyCompression() != null) {
+      compLength = writeCompressionBody(out, batch.getBodyCompression());
+      Preconditions.checkArgument(compLength == ArrowBodyCompression.BODY_COMPRESSION_LENGTH,

Review comment:
       You are right. This check is removed. Thanks. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r480613932



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       @emkornfield Thank you for starting this discussion and sharing your good ideas. 
   Your reasoning makes sense to me. 
   
   I guess I was looking at the problem from a different perspective. 
   
   IMO, the bottleneck of a compressing codec is the CPU resource, and the main purpose of compressing is to reduce memory/network bandwidth consumption.
   
   Given the above assumptions, we should try to do the compression as early as possible. The earliest possible place should be in the `getFieldBuffers` method. In this PR, we do it in `VectorUnLoader`, which is not the best, but close enough to the best. Similarly, we should try to do the decompression as late as possible. In this PR, we do it in `VectorLoader`, which is close to the optimal.
   
   Admittedly, we have additional copies after introducing the compression framework. However, both additional copies are based on the compressed data, with reduced data size, so the overhead should be small.
   
   The above reasoning is based on the assumption that the compression codec could effectively reduce the data size, which is not always true in practice. So I think we can make the decision based on the specific compression codec, and real benchmark data?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r467356218



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtil.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtil {
+
+  private CompressionUtil() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link CompressionCodec}.
+   * The implementation of this method should depend on the values of {@link CompressionType#names}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec codec) {
+    switch (codec.getCodecName()) {
+      case "default":
+        return DefaultCompressionCodec.DEFAULT_BODY_COMPRESSION;
+      case "LZ4_FRAME":
+        return new ArrowBodyCompression((byte) 0, BodyCompressionMethod.BUFFER);

Review comment:
       these isn't an enum of 0 and 1 below?

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {

Review comment:
       nit: I think NoCompressionCodec or something similar to indicate that no compression/decompression happens might this easier to understand for future readers.

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java
##########
@@ -76,6 +97,10 @@ private void appendNodes(FieldVector vector, List<ArrowFieldNode> nodes, List<Ar
           "wrong number of buffers for field %s in vector %s. found: %s",
           vector.getField(), vector.getClass().getSimpleName(), fieldBuffers));
     }
+
+    fieldBuffers = fieldBuffers.stream().map(

Review comment:
       small nit: if we aren't using streams elsewhere we should avoid using them here as well.  they can add some unncessary overhead for serialization/deserialization path.

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {
+
+  public static final DefaultCompressionCodec INSTANCE = new DefaultCompressionCodec();
+
+  public static final byte COMPRESSION_TYPE = -1;
+
+  public static final ArrowBodyCompression DEFAULT_BODY_COMPRESSION =
+      new ArrowBodyCompression(COMPRESSION_TYPE, BodyCompressionMethod.BUFFER);
+
+  private DefaultCompressionCodec() {
+  }
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer) {

Review comment:
       we should clarify in the contract of the interface that implementations take ownership of unCompressedBuffer (i.e. I think in order for this to work properly for real codecs is to free uncompressedBuffer here).  Does that make sense or were you imagining something else?

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -408,11 +408,15 @@ public static ArrowRecordBatch deserializeRecordBatch(RecordBatch recordBatchFB,
       ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
       buffers.add(vectorBuffer);
     }
+
+    ArrowBodyCompression bodyCompression =
+        new ArrowBodyCompression(recordBatchFB.compression().codec(), recordBatchFB.compression().method());

Review comment:
       does compression need to be checked for nullness?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465613822



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {

Review comment:
       Agreed. I have changed the name to CompressionUtil.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-673873146


   > The conda integration test is failing, because the default no compression option is not supported by the specification. Maybe we need to start a discussion in the ML.
   
   I think we should handle this in the implementation.  If default codec is pass through the writer (or at the appropriate level), we shouldn't populate the flatbuffer with it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465619410



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -266,8 +266,15 @@ public static ArrowBlock serialize(WriteChannel out, ArrowRecordBatch batch, Ipc
     long bufferLength = writeBatchBuffers(out, batch);
     Preconditions.checkArgument(bufferLength % 8 == 0, "out is not aligned");
 
+    long compLength = 0L;
+    if (batch.getBodyCompression() != null) {
+      compLength = writeCompressionBody(out, batch.getBodyCompression());
+      Preconditions.checkArgument(compLength == ArrowBodyCompression.BODY_COMPRESSION_LENGTH,
+          "deserialized compression body length not equal to ArrowBodyCompression#BODY_COMPRESSION_LENGTH");

Review comment:
       This check has been removed too. Thanks. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464024594



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtility {

Review comment:
       I don't think Utility matches naming conventions that is used elsewhere in the code base for util classes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r480704140



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       @liyafan82 good points.  I think the other use-case I didn't consider is Flight which I think will already be in off-heap memory. 
   
   In terms of copies, the should be smaller in general, but it would be good to to understand if there are any limitations.  It looks like at least [airlift supports off-heap decompression](https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/lz4/Lz4RawDecompressor.java), so I think this API is probably OK from that perspective.
   
   @rymurr did you have any other comments otherwise I can take another look and then I think we can merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r477024511



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowRecordBatch.java
##########
@@ -194,12 +190,17 @@ public int writeTo(FlatBufferBuilder builder) {
     int nodesOffset = FBSerializables.writeAllStructsToVector(builder, nodes);
     RecordBatch.startBuffersVector(builder, buffers.size());
     int buffersOffset = FBSerializables.writeAllStructsToVector(builder, buffersLayout);
-    int compressOffset = bodyCompression.writeTo(builder);
+    int compressOffset = 0;
+    if (bodyCompression != null) {

Review comment:
       I would think the only change really necessary here would have been to compare bodyCompression == NO_COMPRESSION_CODEC that way you can keep things non-null everyplace else within the code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025727



##########
File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java
##########
@@ -353,6 +361,23 @@ private InputStream asInputStream(BufferAllocator allocator) {
         // the reference count
         b.getReferenceManager().retain();
       }
+
+      // add compression info, if any
+      if (bodyCompression != null) {
+        ArrowBuf compBuf = allocator.buffer(ArrowBodyCompression.BODY_COMPRESSION_LENGTH);
+        compBuf.setByte(0, bodyCompression.getCodec());

Review comment:
       it looks like this is manually serializing the table, shouldn't this be delegated to flatbuffers?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465621861



##########
File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java
##########
@@ -353,6 +361,23 @@ private InputStream asInputStream(BufferAllocator allocator) {
         // the reference count
         b.getReferenceManager().retain();
       }
+
+      // add compression info, if any
+      if (bodyCompression != null) {
+        ArrowBuf compBuf = allocator.buffer(ArrowBodyCompression.BODY_COMPRESSION_LENGTH);
+        compBuf.setByte(0, bodyCompression.getCodec());

Review comment:
       In the reivsed PR, we no longer manually serilize/deserialize the table. Thanks. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-667620948


   > Hi, I have a question... probably not related to what this PR focus on.
   What if the compressor / decompressor for the codec will have JNI call for compression / decomperssion? If so, it will have JNI call on each buffer. Can we support a JNI bridge to the RecordBatch level depending on the codec?
   
   I think parquet is considering moving to [Airlift](https://github.com/airlift/aircompressor) which doesn't use JNI.  We should discuss specific library choices on the mailing list. Also it might be nice but not a hard requirement to follow an SPI pattern (similar to what we do for allocators) to avoid a hard dependency on any particular allocator.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986032



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {

Review comment:
       Agreed and accepted. NoCompressionCodec is a better name. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-644563536


   > Hi, I have a question... probably not related to what this PR focus on.
   > What if the compressor / decompressor for the codec will have JNI call for compression / decomperssion? If so, it will have JNI call on each buffer. Can we support a JNI bridge to the RecordBatch level depending on the codec?
   
   @rongma1997 I am not sure, as some implementation issues are not determined yet. It seems there are some Java libraries for basic compression algorithms (e.g. zstd-jni and lz4-java), and we may prefer them. If feel strong about the RecordBatch level JNI bridge, maybe you can open an issue/discussion for it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465622563



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowBodyCompression.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+import org.apache.arrow.flatbuf.BodyCompression;
+
+import com.google.flatbuffers.FlatBufferBuilder;
+
+/**
+ * Compression information about data written to a channel.
+ */
+public class ArrowBodyCompression implements FBSerializable {
+
+  /**
+   * Length of the serialized object.
+   */
+  public static final long BODY_COMPRESSION_LENGTH = 2L;

Review comment:
       Sounds reasonable. We have removed it in the revised PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r464025600



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -403,16 +421,29 @@ public static ArrowRecordBatch deserializeRecordBatch(RecordBatch recordBatchFB,
       nodes.add(new ArrowFieldNode(node.length(), node.nullCount()));
     }
     List<ArrowBuf> buffers = new ArrayList<>();
+    long curOffset = 0L;
     for (int i = 0; i < recordBatchFB.buffersLength(); ++i) {
       Buffer bufferFB = recordBatchFB.buffers(i);
       ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
+      curOffset = bufferFB.offset() + bufferFB.length();
       buffers.add(vectorBuffer);
     }
+
+    if (curOffset % 8 != 0) {
+      curOffset += 8 - curOffset % 8;
+    }
+    ArrowBodyCompression bodyCompression = null;

Review comment:
       It looks like this is manually parsing the buffer stream?  I would have expected to directly reference A body compression object directly




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986947



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##########
@@ -408,11 +408,15 @@ public static ArrowRecordBatch deserializeRecordBatch(RecordBatch recordBatchFB,
       ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
       buffers.add(vectorBuffer);
     }
+
+    ArrowBodyCompression bodyCompression =
+        new ArrowBodyCompression(recordBatchFB.compression().codec(), recordBatchFB.compression().method());

Review comment:
       According to our current implementation. The compression object cannot be null.
   For the sake of safety, we added check here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r493187630



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param allocator the allocator for allocating memory for compressed buffer.
+   * @param unCompressedBuffer the buffer to compress.
+   *                           Implementation of this method should take care of releasing this buffer.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(BufferAllocator allocator, ArrowBuf unCompressedBuffer);

Review comment:
       yes, I think so.  i'll merge




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r465621532



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.memory.ArrowBuf;
+
+/**
+ * The codec for compression/decompression.
+ */
+public interface CompressionCodec {
+
+  /**
+   * Compress a buffer.
+   * @param input the buffer to compress.
+   * @return the compressed buffer.
+   */
+  ArrowBuf compress(ArrowBuf input);

Review comment:
       Sure. The suggested name is better. Thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rongma1997 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

Posted by GitBox <gi...@apache.org>.
rongma1997 commented on pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#issuecomment-644536052


   Hi, I have a question... probably not related to what this PR focus on. 
   What if the compressor / decompressor for the codec will have JNI call for compression / decomperssion? If so, it will have JNI call on each buffer. Can we support a JNI bridge to the RecordBatch level depending on the codec?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org