You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/06/05 09:52:11 UTC

[GitHub] [arrow] wgtmac commented on a diff in pull request #35886: GH-35287: [C++][Parquet] Add CodecOptions to customize the compression parameter

wgtmac commented on code in PR #35886:
URL: https://github.com/apache/arrow/pull/35886#discussion_r1217818957


##########
cpp/src/arrow/util/compression.h:
##########
@@ -124,7 +168,9 @@ class ARROW_EXPORT Codec {
 
   /// \brief Create a codec for the given compression algorithm
   static Result<std::unique_ptr<Codec>> Create(
-      Compression::type codec, int compression_level = kUseDefaultCompressionLevel);
+      Compression::type codec,

Review Comment:
   This is a breaking change. It would be better to add a new overload. Then implement this one based on the new overload and label it as deprecated.



##########
cpp/src/parquet/types.h:
##########
@@ -488,7 +488,8 @@ PARQUET_EXPORT
 std::unique_ptr<Codec> GetCodec(Compression::type codec);
 
 PARQUET_EXPORT
-std::unique_ptr<Codec> GetCodec(Compression::type codec, int compression_level);
+std::unique_ptr<Codec> GetCodec(Compression::type codec,

Review Comment:
   Another breaking change here.



##########
cpp/src/parquet/column_writer.cc:
##########
@@ -683,20 +683,21 @@ class BufferedPageWriter : public PageWriter {
 
 std::unique_ptr<PageWriter> PageWriter::Open(
     std::shared_ptr<ArrowOutputStream> sink, Compression::type codec,
-    int compression_level, ColumnChunkMetaDataBuilder* metadata,
-    int16_t row_group_ordinal, int16_t column_chunk_ordinal, MemoryPool* pool,
-    bool buffered_row_group, std::shared_ptr<Encryptor> meta_encryptor,
-    std::shared_ptr<Encryptor> data_encryptor, bool page_write_checksum_enabled,
-    ColumnIndexBuilder* column_index_builder, OffsetIndexBuilder* offset_index_builder) {
+    const std::shared_ptr<CodecOptions>& codec_options,
+    ColumnChunkMetaDataBuilder* metadata, int16_t row_group_ordinal,
+    int16_t column_chunk_ordinal, MemoryPool* pool, bool buffered_row_group,
+    std::shared_ptr<Encryptor> meta_encryptor, std::shared_ptr<Encryptor> data_encryptor,
+    bool page_write_checksum_enabled, ColumnIndexBuilder* column_index_builder,
+    OffsetIndexBuilder* offset_index_builder) {
   if (buffered_row_group) {
     return std::unique_ptr<PageWriter>(new BufferedPageWriter(
-        std::move(sink), codec, compression_level, metadata, row_group_ordinal,
+        std::move(sink), codec, codec_options, metadata, row_group_ordinal,

Review Comment:
   I am not sure if we are able to move `codec` into `codec_options` as well. I don't have a preference here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org