You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/17 00:30:12 UTC

[GitHub] [flink] SteNicholas opened a new pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

SteNicholas opened a new pull request #17500:
URL: https://github.com/apache/flink/pull/17500


   ## What is the purpose of the change
   
   *According to the experience of other frameworks, snappy compression is recommended by default, which will reduce the file size.This does not affect reading, because these formats will automatically uncompress the file according to the head information of the file.  Parquet, orc and.avro format could define the default of the compression codec as the snappy compression.*
   
   ## Brief change log
   
     - *Update the `AvroFormatOptions`, `OrcBulkWriterFactory` and `ParquetRowDataBuilder` for the change of the default compression to snappy.*
   
   ## Verifying this change
   
     - *Update the `AvroFilesystemITCase` and `OrcFileSystemITCase` to verify whether the default compression could change to snappy.*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f326685880c91dfc439431829658b0eeda7a7bd4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingsongLi commented on a change in pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on a change in pull request #17500:
URL: https://github.com/apache/flink/pull/17500#discussion_r730531888



##########
File path: flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/OrcBulkWriterFactory.java
##########
@@ -113,6 +115,10 @@ public OrcBulkWriterFactory(
             for (Map.Entry<String, String> entry : confMap.entrySet()) {
                 conf.set(entry.getKey(), entry.getValue());
             }
+            writerProperties.setProperty(
+                    OrcConf.COMPRESS.getAttribute(),

Review comment:
       It seems the default compression of orc is zlib. We can not change it.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataBuilder.java
##########
@@ -122,7 +122,11 @@ public FlinkParquetBuilder(RowType rowType, Configuration conf, boolean utcTimes
         public ParquetWriter<RowData> createWriter(OutputFile out) throws IOException {
             Configuration conf = configuration.conf();
             return new ParquetRowDataBuilder(out, rowType, utcTimestamp)
-                    .withCompressionCodec(getParquetCompressionCodec(conf))
+                    .withCompressionCodec(
+                            CompressionCodecName.fromConf(

Review comment:
       `CompressionCodecName.fromConf(configuration.get(ParquetOutputFormat.COMPRESSION, CompressionCodecName.SNAPPY.name()))`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] SteNicholas commented on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-946324910


   @JingsongLi , sorry for missing the update of `ParquetFileSystemITCase`. I have fixed the `testNonPartition` test case of `ParquetFileSystemITCase`. Please help to review again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingsongLi commented on a change in pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on a change in pull request #17500:
URL: https://github.com/apache/flink/pull/17500#discussion_r730531888



##########
File path: flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/OrcBulkWriterFactory.java
##########
@@ -113,6 +115,10 @@ public OrcBulkWriterFactory(
             for (Map.Entry<String, String> entry : confMap.entrySet()) {
                 conf.set(entry.getKey(), entry.getValue());
             }
+            writerProperties.setProperty(
+                    OrcConf.COMPRESS.getAttribute(),

Review comment:
       It seems the default compression of orc is zlib. We don't have to change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ee65abb6798b7a2c68231140226bec9967ce07b Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106) 
   * f326685880c91dfc439431829658b0eeda7a7bd4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ee65abb6798b7a2c68231140226bec9967ce07b Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106) 
   * f326685880c91dfc439431829658b0eeda7a7bd4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25196",
       "triggerID" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0a53a4e8ac2ded109be29135846f25a1a251631c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25196) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25196",
       "triggerID" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f326685880c91dfc439431829658b0eeda7a7bd4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153) 
   * 0a53a4e8ac2ded109be29135846f25a1a251631c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25196) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25106",
       "triggerID" : "0ee65abb6798b7a2c68231140226bec9967ce07b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153",
       "triggerID" : "f326685880c91dfc439431829658b0eeda7a7bd4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0a53a4e8ac2ded109be29135846f25a1a251631c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f326685880c91dfc439431829658b0eeda7a7bd4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25153) 
   * 0a53a4e8ac2ded109be29135846f25a1a251631c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-944278328






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingsongLi merged pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
JingsongLi merged pull request #17500:
URL: https://github.com/apache/flink/pull/17500


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingsongLi commented on pull request #17500: [FLINK-24564][formats] Change the default compression to snappy for parquet, orc, avro in table

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on pull request #17500:
URL: https://github.com/apache/flink/pull/17500#issuecomment-945307781


   @SteNicholas Thanks for the contribution, you can comment in the JIRA first. (I cannot find your name)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org