You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/09/24 08:23:48 UTC

[GitHub] [drill] dzamo opened a new pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

dzamo opened a new pull request #2321:
URL: https://github.com/apache/drill/pull/2321


   # [DRILL-7969](https://issues.apache.org/jira/browse/DRILL-7969): DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd
   
   ## Description
   
   Adds support for all the standardised Parquet compression codecs beyond GZip
   and Snappy by making use of the airlift/aircompressor library with a fallback
   to parquet-mr for compression for codecs not implemented in parquet-mr.
   
   A new, delegating CompressionCodecFactory implementation is included.  This
   handles the routing of (de)compression to the correct lib while having a
   minimal impact on the calling code in the Parquet reading and writing parts
   of the Drill codebase.
   
   ## Documentation
   
   New codec options available in for selection by users in `store.parquet.compression`.  I'll look at the Drill docs to see if there are any pages discussing Parquet compression codecs that can be updates.
   
   ## Testing
   
   New unit tests for each codec in exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
   
   Test CTAS then SELECT with each new codec in drill-embedded.
   
   Use pyarrow to read output Parquet metadata and check reported codec is correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo edited a comment on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo edited a comment on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927536124


   Thanks.  I can see now that com.rdblue.brotlicodec packages its own native libbrotli.so, and they only compile it for Linux and Mac amd64.  Some other guys have already discovered the same thing
   
   https://github.com/rdblue/brotli-codec/pull/1
   https://issues.apache.org/jira/browse/PARQUET-1975
   
   I guess we have to take a decision about how we want to deal with this...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927536124


   Thanks.  I can see now that com.rdblue.brotlicodec packages its own native libbrotli.so, and they only compile it for Linux and Mac AMD64.  Some other guys have already discovered the same thing
   
   https://github.com/rdblue/brotli-codec/pull/1
   https://issues.apache.org/jira/browse/PARQUET-1975
   
   I guess we have to take a decision about how we want to deal with this...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717798313



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       If `AirliftBytesInputCompressor` is specified, no cast is required.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717769319



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;
+
+  // backing aircompressor decompressor
+  private Decompressor airDecomp = null;
+
+  // the direct memory allocator to be used for (de)compression outputs
+  private ByteBufferAllocator allocator;
+
+  // all the direct memory buffers we've allocated, and must release
+  private Stack<ByteBuffer> allocatedBuffers;
+
+  public AirliftBytesInputCompressor(CompressionCodecName codecName, ByteBufferAllocator allocator) {
+    this.codecName = codecName;
+
+    switch (codecName) {
+    case LZ4:
+      airComp = new Lz4Compressor();
+      airDecomp = new Lz4Decompressor();
+      break;
+    case LZO:
+      airComp = new LzoCompressor();
+      airDecomp = new LzoDecompressor();
+      break;
+    case SNAPPY:
+      airComp = new SnappyCompressor();
+      airDecomp = new SnappyDecompressor();
+      break;
+    case ZSTD:
+      airComp = new ZstdCompressor();
+      airDecomp = new ZstdDecompressor();
+      break;
+    default:
+      throw new UnsupportedOperationException("Parquet compression codec is not supported: " + codecName);
+    }
+
+    this.allocator = allocator;
+    this.allocatedBuffers = new Stack<>();
+
+    logger.debug(String.format(
+        "constructed a %s using a backing compressor of %s",
+        getClass().getName(),
+        airComp.getClass().getName()
+    ));

Review comment:
       Clever!

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Just one cast remained necessary, because I have a single class implementing both BytesInputCompressor and BytesInputDecompressor 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717727768



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;
+
+  // backing aircompressor decompressor
+  private Decompressor airDecomp = null;
+
+  // the direct memory allocator to be used for (de)compression outputs
+  private ByteBufferAllocator allocator;
+
+  // all the direct memory buffers we've allocated, and must release
+  private Stack<ByteBuffer> allocatedBuffers;
+
+  public AirliftBytesInputCompressor(CompressionCodecName codecName, ByteBufferAllocator allocator) {
+    this.codecName = codecName;
+
+    switch (codecName) {
+    case LZ4:
+      airComp = new Lz4Compressor();
+      airDecomp = new Lz4Decompressor();
+      break;
+    case LZO:
+      airComp = new LzoCompressor();
+      airDecomp = new LzoDecompressor();
+      break;
+    case SNAPPY:
+      airComp = new SnappyCompressor();
+      airDecomp = new SnappyDecompressor();
+      break;
+    case ZSTD:
+      airComp = new ZstdCompressor();
+      airDecomp = new ZstdDecompressor();
+      break;
+    default:
+      throw new UnsupportedOperationException("Parquet compression codec is not supported: " + codecName);
+    }
+
+    this.allocator = allocator;
+    this.allocatedBuffers = new Stack<>();
+
+    logger.debug(String.format(
+        "constructed a %s using a backing compressor of %s",
+        getClass().getName(),
+        airComp.getClass().getName()
+    ));

Review comment:
       Please use logger formatting instead of `String.format()`, so for the case when log level is above provided, the message wouldn't be constructed.
   ```suggestion
       logger.debug(
           "constructed a {} using a backing compressor of {}",
           getClass().getName(),
           airComp.getClass().getName()
       );
   ```

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;

Review comment:
       No need to initialize fields with null values.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Please use generics and remove unnecessary casts below:
   ```suggestion
     private final Map<CompressionCodecName, AirliftBytesInputCompressor> airCompressors = new HashMap<>();
   ```

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       If `AirliftBytesInputCompressor` is specified, no cast is required.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r718151511



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Of course 🤦.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r718151511



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Of course 🤦.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927269529


   @vvysotskyi There are two test errors in the ARM64 build that look like they would be resolved by adding `libbrotli1` to the `apt: packages` list in the Travis config.
   ```
   [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.902 s <<< FAILURE! - in org.apache.drill.TestAggNullable
   
   [ERROR] org.apache.drill.TestAggNullable  Time elapsed: 1.902 s  <<< ERROR!
   
   java.util.ServiceConfigurationError: org.apache.hadoop.io.compress.CompressionCodec: Provider org.apache.hadoop.io.compress.BrotliCodec could not be instantiated
   
   Caused by: java.lang.UnsatisfiedLinkError: Couldn't load native library 'brotli'. [LoaderResult: os.name="Linux", os.arch="aarch64", os.version="5.4.0-1021-aws", java.vm.name="OpenJDK 64-Bit Server VM", java.vm.version="25.292-b10", java.vm.vendor="Private Build", alreadyLoaded="null", loadedFromSystemLibraryPath="false", nativeLibName="libbrotli.so", temporaryLibFile="/home/travis/build/apache/drill/exec/java-exec/target/brotli9222326058524323719/libbrotli.so", libNameWithinClasspath="/lib/linux-aarch64/libbrotli.so", usedThisClassloader="false", usedSystemClassloader="false", java.library.path="/usr/java/packages/lib/aarch64:/usr/lib/aarch64-linux-gnu/jni:/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/lib/jni:/lib:/usr/lib"]
   ```
   Do you know how to do that?  Even if that fixes it, I must admit that I am not able to see why the two tests which fail on ARM, `TestAggNullable` and `CredentialsProviderSerDeTest` even look for this Hadoop compression codec, yet clearly never did before this PR.  Neither test has anything to do with Parquet, which is the only area I worked in.  Perhaps it's buried somewhere in the test class inheritcance hierarchy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo merged pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo merged pull request #2321:
URL: https://github.com/apache/drill/pull/2321


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927317214


   From what I can see (from mobile) that last push got clean bill of health, apart from a 100 minute timeout in the JDK 11 build. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717769319



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;
+
+  // backing aircompressor decompressor
+  private Decompressor airDecomp = null;
+
+  // the direct memory allocator to be used for (de)compression outputs
+  private ByteBufferAllocator allocator;
+
+  // all the direct memory buffers we've allocated, and must release
+  private Stack<ByteBuffer> allocatedBuffers;
+
+  public AirliftBytesInputCompressor(CompressionCodecName codecName, ByteBufferAllocator allocator) {
+    this.codecName = codecName;
+
+    switch (codecName) {
+    case LZ4:
+      airComp = new Lz4Compressor();
+      airDecomp = new Lz4Decompressor();
+      break;
+    case LZO:
+      airComp = new LzoCompressor();
+      airDecomp = new LzoDecompressor();
+      break;
+    case SNAPPY:
+      airComp = new SnappyCompressor();
+      airDecomp = new SnappyDecompressor();
+      break;
+    case ZSTD:
+      airComp = new ZstdCompressor();
+      airDecomp = new ZstdDecompressor();
+      break;
+    default:
+      throw new UnsupportedOperationException("Parquet compression codec is not supported: " + codecName);
+    }
+
+    this.allocator = allocator;
+    this.allocatedBuffers = new Stack<>();
+
+    logger.debug(String.format(
+        "constructed a %s using a backing compressor of %s",
+        getClass().getName(),
+        airComp.getClass().getName()
+    ));

Review comment:
       Clever!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717727768



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;
+
+  // backing aircompressor decompressor
+  private Decompressor airDecomp = null;
+
+  // the direct memory allocator to be used for (de)compression outputs
+  private ByteBufferAllocator allocator;
+
+  // all the direct memory buffers we've allocated, and must release
+  private Stack<ByteBuffer> allocatedBuffers;
+
+  public AirliftBytesInputCompressor(CompressionCodecName codecName, ByteBufferAllocator allocator) {
+    this.codecName = codecName;
+
+    switch (codecName) {
+    case LZ4:
+      airComp = new Lz4Compressor();
+      airDecomp = new Lz4Decompressor();
+      break;
+    case LZO:
+      airComp = new LzoCompressor();
+      airDecomp = new LzoDecompressor();
+      break;
+    case SNAPPY:
+      airComp = new SnappyCompressor();
+      airDecomp = new SnappyDecompressor();
+      break;
+    case ZSTD:
+      airComp = new ZstdCompressor();
+      airDecomp = new ZstdDecompressor();
+      break;
+    default:
+      throw new UnsupportedOperationException("Parquet compression codec is not supported: " + codecName);
+    }
+
+    this.allocator = allocator;
+    this.allocatedBuffers = new Stack<>();
+
+    logger.debug(String.format(
+        "constructed a %s using a backing compressor of %s",
+        getClass().getName(),
+        airComp.getClass().getName()
+    ));

Review comment:
       Please use logger formatting instead of `String.format()`, so for the case when log level is above provided, the message wouldn't be constructed.
   ```suggestion
       logger.debug(
           "constructed a {} using a backing compressor of {}",
           getClass().getName(),
           airComp.getClass().getName()
       );
   ```

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import io.airlift.compress.Compressor;
+import io.airlift.compress.Decompressor;
+import io.airlift.compress.lz4.Lz4Compressor;
+import io.airlift.compress.lz4.Lz4Decompressor;
+import io.airlift.compress.lzo.LzoCompressor;
+import io.airlift.compress.lzo.LzoDecompressor;
+import io.airlift.compress.snappy.SnappyCompressor;
+import io.airlift.compress.snappy.SnappyDecompressor;
+import io.airlift.compress.zstd.ZstdCompressor;
+import io.airlift.compress.zstd.ZstdDecompressor;
+
+/**
+ * A shim making an aircompressor (de)compressor available through the BytesInputCompressor
+ * and BytesInputDecompressor interfaces.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor, CompressionCodecFactory.BytesInputDecompressor {
+  private static final Logger logger = LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // name of the codec provided by this compressor
+  private CompressionCodecName codecName;
+
+  // backing aircompressor compressor
+  private Compressor airComp = null;

Review comment:
       No need to initialize fields with null values.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Please use generics and remove unnecessary casts below:
   ```suggestion
     private final Map<CompressionCodecName, AirliftBytesInputCompressor> airCompressors = new HashMap<>();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927319908


   @dzamo, TravisCI still failing with the same error:
   ```
   [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.923 s <<< FAILURE! - in org.apache.drill.TestAggNullable
   [ERROR] org.apache.drill.TestAggNullable  Time elapsed: 1.923 s  <<< ERROR!
   java.util.ServiceConfigurationError: org.apache.hadoop.io.compress.CompressionCodec: Provider org.apache.hadoop.io.compress.BrotliCodec could not be instantiated
   Caused by: java.lang.UnsatisfiedLinkError: Couldn't load native library 'brotli'. [LoaderResult: os.name="Linux", os.arch="aarch64", os.version="5.4.0-1021-aws", java.vm.name="OpenJDK 64-Bit Server VM", java.vm.version="25.292-b10", java.vm.vendor="Private Build", alreadyLoaded="null", loadedFromSystemLibraryPath="false", nativeLibName="libbrotli.so", temporaryLibFile="/home/travis/build/apache/drill/exec/java-exec/target/brotli3793353066156053952/libbrotli.so", libNameWithinClasspath="/lib/linux-aarch64/libbrotli.so", usedThisClassloader="false", usedSystemClassloader="false", java.library.path="/usr/java/packages/lib/aarch64:/usr/lib/aarch64-linux-gnu/jni:/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/lib/jni:/lib:/usr/lib"]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-927277174


   @dzamo, you can add commands for installing specific packages for Travis into the `.travis.yml` file:
   https://github.com/apache/drill/blob/master/.travis.yml#L51


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo merged pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo merged pull request #2321:
URL: https://github.com/apache/drill/pull/2321


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r717771588



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on code in Drill relying on a CompressCodecFactory.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+  private static final Logger logger = LoggerFactory.getLogger(DrillCompressionCodecFactory.class);
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map airCompressors = new HashMap();

Review comment:
       Just one cast remained necessary, because I have a single class implementing both BytesInputCompressor and BytesInputDecompressor 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] dzamo commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
dzamo commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-926445605


   Hi @vdiravka, @vvysotskyi.  The only new thing I have to add to the discussion of this PR is a request on the Maven dependencies.  Please may you look at what I've done in the `pom.xml` files to see whether it can be done better?  I struggled for a long time to get aircompressor-0.1 out of drill-hive-exec-shaded-1.20.0-SNAPSHOT.jar.  In the end the only thing that worked was shading it to a new namespace with `<relocate>`.  Maven just seeemd to ignore my `<exclusion>` attempts, either under `<dependency>` or under the maven-shade-plugin config.  I'm still confused about this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on a change in pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2321:
URL: https://github.com/apache/drill/pull/2321#discussion_r715425598



##########
File path: contrib/storage-hive/hive-exec-shade/pom.xml
##########
@@ -120,6 +120,10 @@
           <groupId>org.codehaus.jackson</groupId>
           <artifactId>jackson-xc</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>io.airlift</groupId>
+          <artifactId>aircompressor</artifactId>
+        </exclusion>

Review comment:
       Is there any reason for excluding `aircompressor` here if you have relocated it below?

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+import io.airlift.compress.Compressor;
+
+/**
+ * A shim making an aircompressor compressor available through the BytesInputCompressor
+ * interface.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor {
+
+  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AirliftBytesInputCompressor.class);

Review comment:
       Please either use imports instead of specifying the whole pakage here or use Lombok.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on the calling code in Parquet reading and writing parts of the Drill code
+ * base.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(new CompressionCodecName[] { CompressionCodecName.LZ4, CompressionCodecName.LZO,

Review comment:
       `new CompressionCodecName[] {}` is redundant here, please remove it and specify a list of codecs for `Arrays.asList()` method.

##########
File path: exec/jdbc-all/pom.xml
##########
@@ -548,7 +548,7 @@
                   This is likely due to you adding new dependencies to a java-exec and not updating the excludes in this module. This is important as it minimizes the size of the dependency of Drill application users.
 
                   </message>
-                  <maxsize>46600000</maxsize>
+                  <maxsize>47200000</maxsize>

Review comment:
       Are you sure that these newly added libraries should be included in the JDBC jar? I think it is better to exclude them from here...

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.CodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+/**
+ * A delegating compression codec factory that returns (de)compressors based on
+ * https://github.com/airlift/aircompressor when possible and falls back to
+ * parquet-mr otherwise.  The aircompressor lib was introduced into Drill
+ * because of difficulties encountered with the JNI-based implementations of
+ * lzo, lz4 and zstd in parquet-mr.
+ *
+ * By modifying the constant AIRCOMPRESSOR_CODECS it is possible to choose
+ * which codecs should be routed to which lib.  In addition, this class
+ * implements parquet-mr's CompressionCodecFactory interface meaning that
+ * swapping this factory for e.g. one in parquet-mr will have minimal impact
+ * on the calling code in Parquet reading and writing parts of the Drill code
+ * base.
+ *
+ */
+public class DrillCompressionCodecFactory implements CompressionCodecFactory {
+
+  // The set of codecs to be handled by aircompressor
+  private static final Set<CompressionCodecName> AIRCOMPRESSOR_CODECS = new HashSet<>(
+      Arrays.asList(new CompressionCodecName[] { CompressionCodecName.LZ4, CompressionCodecName.LZO,
+          CompressionCodecName.SNAPPY, CompressionCodecName.ZSTD }));
+
+  // pool of reused aircompressor compressors (parquet-mr's factory has its own)
+  private final Map<CompressionCodecName, BytesInputCompressor> compressors = new HashMap<>();
+
+  // pool of reused aircompressor decompressors (parquet-mr's factory has its own)
+  private final Map<CompressionCodecName, BytesInputDecompressor> decompressors = new HashMap<>();
+
+  // fallback parquet-mr compression codec factory
+  private CompressionCodecFactory parqCodecFactory;
+
+  // direct memory allocator to be used during (de)compression
+  private ByteBufferAllocator allocator;
+
+  // static builder method, solely to mimick the parquet-mr API as closely as possible
+  public static CompressionCodecFactory createDirectCodecFactory(Configuration config, ByteBufferAllocator allocator,
+      int pageSize) {
+    return new DrillCompressionCodecFactory(config, allocator, pageSize);
+  }
+
+  public DrillCompressionCodecFactory(Configuration config, ByteBufferAllocator allocator, int pageSize) {
+    this.allocator = allocator;
+    this.parqCodecFactory = CodecFactory.createDirectCodecFactory(config, allocator, pageSize);
+  }
+
+  @Override
+  public BytesInputCompressor getCompressor(CompressionCodecName codecName) {
+    if (AIRCOMPRESSOR_CODECS.contains(codecName)) {
+      BytesInputCompressor comp = compressors.get(codecName);
+      if (comp == null) {
+        comp = new AirliftBytesInputCompressor(codecName, allocator);
+        compressors.put(codecName, comp);
+      }
+      return comp;

Review comment:
       Please use the `Map.computeIfAbsent()` method here and below.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+import io.airlift.compress.Compressor;
+
+/**
+ * A shim making an aircompressor compressor available through the BytesInputCompressor
+ * interface.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor {
+
+  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // the codec used by this compressor
+  private CompressionCodecName codecName;
+
+  // the relevant aircompressor compressor
+  private Compressor airComp = null;
+
+  // the direct memory allocator to be used during compression
+  private ByteBufferAllocator allocator;
+
+  // stack tracking all direct memory buffers we allocated, and must release
+  private Stack<ByteBuffer> ourAllocations;
+
+  public AirliftBytesInputCompressor(CompressionCodecName codecName, ByteBufferAllocator allocator) {
+    this.codecName = codecName;
+
+    switch (codecName) {
+    case LZ4:
+      airComp = new io.airlift.compress.lz4.Lz4Compressor();

Review comment:
       Please use imports here and below too.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
##########
@@ -17,18 +17,24 @@
  */
 package org.apache.drill.exec.store.parquet.columnreaders;
 
+import static org.apache.parquet.column.Encoding.valueOf;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;

Review comment:
       Please revert changes related to rearranging imports until we don't have determined the correct ordering for all the project.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/AirliftBytesInputCompressor.java
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Stack;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.compression.CompressionCodecFactory;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+import io.airlift.compress.Compressor;
+
+/**
+ * A shim making an aircompressor compressor available through the BytesInputCompressor
+ * interface.
+ */
+public class AirliftBytesInputCompressor implements CompressionCodecFactory.BytesInputCompressor {
+
+  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AirliftBytesInputCompressor.class);
+
+  // the codec used by this compressor
+  private CompressionCodecName codecName;
+
+  // the relevant aircompressor compressor
+  private Compressor airComp = null;
+
+  // the direct memory allocator to be used during compression
+  private ByteBufferAllocator allocator;
+
+  // stack tracking all direct memory buffers we allocated, and must release
+  private Stack<ByteBuffer> ourAllocations;

Review comment:
       ```suggestion
     private Stack<ByteBuffer> allocatedBuffers;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on pull request #2321: DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd codecs

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on pull request #2321:
URL: https://github.com/apache/drill/pull/2321#issuecomment-926463079


   @dzamo, the build for this PR fails, please take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org