You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/01/18 07:14:55 UTC

[GitHub] [spark] wangyum opened a new pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

wangyum opened a new pull request #26804:
URL: https://github.com/apache/spark/pull/26804


   ### What changes were proposed in this pull request?
   
   This PR upgrade Parquet to 1.11.1.
   
   Parquet 1.11.1 new features:
   
   - [PARQUET-1201](https://issues.apache.org/jira/browse/PARQUET-1201) - Column indexes
   - [PARQUET-1253](https://issues.apache.org/jira/browse/PARQUET-1253) - Support for new logical type representation
   - [PARQUET-1388](https://issues.apache.org/jira/browse/PARQUET-1388) - Nanosecond precision time and timestamp - parquet-mr
   
   More details:
   https://github.com/apache/parquet-mr/blob/master/CHANGES.md
   
   
   ### Why are the changes needed?
   Support column indexes to improve query performance.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Exist test.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764201684






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769671092


   @wangyum :clap: great work !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630438863


   Jenkins, retest it again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559844780



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       1. `key_value` introduced by this fix: https://issues.apache.org/jira/browse/PARQUET-1879
   2. I also use Parquet 1.11.1 to read this file: https://issues.apache.org/jira/browse/SPARK-32639
   ![image](https://user-images.githubusercontent.com/5399861/104973076-ab11d480-5a2e-11eb-9f1c-968ec63bce58.png)
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764201684






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764334636


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762420859


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134204/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
   2. Writing out checksums has minimal performance impact.
   3. Do we really need this feature? I haven't seen Spark SQL users request this feature. This change just disable it by default, users can still enable this feature.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767415007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134487/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630917979


   @dongjoon-hyun You are absolutely right about no Hive with Avro 1.9 and that's the REAL problem. I don't think creating a PR that passes all UT (including Hive 1.2/2.3 profile) for Spark with Avro 1.9 is possible because Hive is leaking older versions of Avro that are not API compatible.
   
   I don't know how to deal with this. I tried to patch Hive [HIVE-21737](https://issues.apache.org/jira/browse/HIVE-21737) for this but was blocked on testing issues there, but the issue is also that even if merged we need them to backport the fix back to version 2.x (Hive in master is already in version 4.x). Notice that the Avro upgrade addresses also various security issues in its deps that are still leaking and present on Spark (yes jackson among others).
   
   I really want this to happen to get Avro 1.9.x downstream but it feels we are somehow locked because of Hive. If you or anyone can suggest how to do this, I will be more than glad to help with what I can. Also if someone knows someone at the Hive project who can care about this, maybe that would be another big help.
   
   CC: @kgyrtkirk for eventual comments/suggestions because he tried to help me in the Hive side.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762573083


   **[Test build #134211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561447322



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       Looking at the original PR, I think the change should be backward-compatible (`map` annotation can still be handled  on the read path). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       It will change the data order, please seem https://github.com/apache/spark/pull/26804#discussion_r561044576.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       It will change the data order, please see https://github.com/apache/spark/pull/26804#discussion_r561044576.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
 
     val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
     Seq("", "parquet").foreach { useV1Source =>
-      withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+        ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {

Review comment:
       Thank you @gszadovszky The size is different if enable the CRC write: 
   ```
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-0ae44ddf-40bb-4ba5-84af-ec8cec037847-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-166cdfbf-b19d-4d55-b4ea-fbad6bcac9df-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  462 Jan 21 22:20 part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-3c52b355-290a-4dd4-aad3-4bb2960ba3b8-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-4486173e-d650-4548-8da4-b95ae0305d8c-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-4c3786f4-2702-4f58-9604-c3deed68bc86-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-71bf8d51-95b0-43a8-969b-c28630f90066-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-78776231-370d-45c6-8520-67b94c33c697-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-a2a811aa-a495-4439-9daf-8c4b2cb258d5-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-bf745883-0b3a-4383-8669-7464833bfea8-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-d3a65d34-cd1a-434d-a86c-8ee0203b3bac-c000.snappy.parquet
   yumwang@LM-SHC-16508156 1611238822602 % parquet-tools cat part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
   a = 2
   ```
   and we will order the file by size:
   https://github.com/apache/spark/blob/8ed23ed499ec7745a8e9bdc4c4fb3200fdb6c3c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L609
   
   Not sure if it caused by int overflow:
   https://github.com/apache/parquet-mr/pull/647#discussion_r561914480




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764206843


   **[Test build #134301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765888093


   **[Test build #134400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-768793621


   > Could you create a 3.2.0 blocker JIRA?
   
   OK, https://issues.apache.org/jira/browse/SPARK-34276.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078


   @sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x. Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on `parquet-avro` 1.11.x or newer? I don't know if that is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078


   @sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x.
   
   Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564097098



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       @wangyum Any chance you could elaborate on this a bit more? Are we convinced that the issue you pointed out in https://github.com/apache/spark/pull/26804#discussion_r561044576 is actually a regression caused by parquet and not a problem with the test itself (e.g. caused by any non-trivial assumptions made w.r.t. the output files)? Considering the benefit of having checksums enabled by default (e.g. much improved visibility into hard to debug data corruption issues), I'd propose further investigation before disabling the feature entirely and having Spark diverge from the `parquet-mr` defaults.
   
   Regarding the defaults, support for checksums was added back in [PARQUET-1580](https://github.com/apache/parquet-mr/pull/647). These changes were included and released with `parquet-mr` 1.11.0 (see [CHANGES](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/CHANGES.md#version-1110)), and writing out checksums has been enabled by default since the release, see `ParquetProperties.java` in:
   * [master](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L61)
   * [apache-parquet-1.11.0](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
   * [apache-parquet-1.11.1](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
   
   I also noticed that [PARQUET-1746](https://issues.apache.org/jira/browse/PARQUET-1746) was raised and [a PR](https://github.com/apache/parquet-mr/pull/857) was opened for it to set the default to `false`, but that the issue has already been marked as resolved and the PR closed without merging the changes. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564097098



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       @wangyum Any chance you could elaborate on this a bit more? Are we convinced that the issue you pointed out in https://github.com/apache/spark/pull/26804#discussion_r561044576 is actually a regression caused by parquet and not a problem with the test itself (e.g. caused by any non-trivial assumptions made w.r.t. the output files)? Considering the benefit of having checksums enabled by default (e.g. much improved visibility into hard to debug data corruption issues), I'd propose further investigation before disabling the feature entirely and having Spark diverge from the `parquet-mr` defaults.
   
   Regarding the defaults, support for checksums was added back in [PARQUET-1580](https://github.com/apache/parquet-mr/pull/647). These changes were included and released with `parquet-mr` 1.11.0 (see [CHANGES](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/CHANGES.md#version-1110)), and writing out checksums has been enabled by default since the release, see `ParquetProperties.java` in:
   * [master](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L61)
   * [apache-parquet-1.11.0](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
   * [apache-parquet-1.11.1](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
   
   I also noticed that [PARQUET-1746](https://issues.apache.org/jira/browse/PARQUET-1746) was raised and [a PR](https://github.com/apache/parquet-mr/pull/857) was opened for it to set the default to `false`, but that the issue has already been marked as resolved and the PR closed without merging the changes. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762338175


   **[Test build #134204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764394945


   **[Test build #134301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762601527


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762420859


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134204/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762361789


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306


   -1 (non-binding) Please do not merge this into master, it breaks downstream applications due to Avro 1.8.x vs 1.9.x transitive dependencies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769488412


   Thank you, @wangyum and @gatorsmile !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763402025


   Benchmark code and benchmark result:
   ```scala
   /*
    * Licensed to the Apache Software Foundation (ASF) under one or more
    * contributor license agreements.  See the NOTICE file distributed with
    * this work for additional information regarding copyright ownership.
    * The ASF licenses this file to You under the Apache License, Version 2.0
    * (the "License"); you may not use this file except in compliance with
    * the License.  You may obtain a copy of the License at
    *
    *    http://www.apache.org/licenses/LICENSE-2.0
    *
    * Unless required by applicable law or agreed to in writing, software
    * distributed under the License is distributed on an "AS IS" BASIS,
    * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    * See the License for the specific language governing permissions and
    * limitations under the License.
    */
   
   package org.apache.spark.sql.execution.benchmark
   
   import java.io.File
   
   import scala.util.Random
   
   import org.apache.spark.SparkConf
   import org.apache.spark.benchmark.Benchmark
   import org.apache.spark.sql.{DataFrame, SparkSession}
   import org.apache.spark.sql.functions.{monotonically_increasing_id, timestamp_seconds}
   import org.apache.spark.sql.internal.SQLConf
   import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
   import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType}
   
   object ParquetFilterPushdownBenchmark extends SqlBasedBenchmark {
   
     override def getSparkSession: SparkSession = {
       val conf = new SparkConf()
         .setAppName(this.getClass.getSimpleName)
         // Since `spark.master` always exists, overrides this value
         .set("spark.master", "local[1]")
         .setIfMissing("spark.driver.memory", "3g")
         .setIfMissing("spark.executor.memory", "3g")
         .setIfMissing("orc.compression", "snappy")
         .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
   
       SparkSession.builder().config(conf).getOrCreate()
     }
   
     private val numRows = 1024 * 1024 * 15
     private val width = 5
     private val mid = numRows / 2
   
     def withTempTable(tableNames: String*)(f: => Unit): Unit = {
       try f finally tableNames.foreach(spark.catalog.dropTempView)
     }
   
     private def prepareTable(
         dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
       import spark.implicits._
       val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
       val valueCol = if (useStringForValue) {
         monotonically_increasing_id().cast("string")
       } else {
         monotonically_increasing_id()
       }
       val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
         .withColumn("value", valueCol)
         .sort("value")
   
       saveAsTable(df, dir)
     }
   
     private def prepareStringDictTable(
         dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
       val selectExpr = (0 to width).map {
         case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
         case i => s"CAST(rand() AS STRING) c$i"
       }
       val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
   
       saveAsTable(df, dir)
     }
   
     private def saveAsTable(df: DataFrame, dir: File): Unit = {
       val parquetPath = dir.getCanonicalPath + "/parquet"
       df.write.mode("overwrite").parquet(parquetPath)
       spark.read.parquet(parquetPath).createOrReplaceTempView("parquetTable")
     }
   
     def filterPushDownBenchmark(
        values: Int,
        title: String,
        whereExpr: String,
        selectExpr: String = "*"): Unit = {
       val benchmark = new Benchmark(title, values, minNumIters = 5, output = output)
   
       Seq(false, true).foreach { pushDownEnabled =>
         val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
         benchmark.addCase(name) { _ =>
           withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
             spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").noop()
           }
         }
       }
   
       benchmark.run()
     }
   
     private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
       Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
         val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
         filterPushDownBenchmark(numRows, title, whereExpr)
       }
   
       Seq(
         s"value = $mid",
         s"value <=> $mid",
         s"$mid <= value AND value <= $mid",
         s"${mid - 1} < value AND value < ${mid + 1}"
       ).foreach { whereExpr =>
         val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
         filterPushDownBenchmark(numRows, title, whereExpr)
       }
   
       val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
   
       Seq(10, 50, 90).foreach { percent =>
         filterPushDownBenchmark(
           numRows,
           s"Select $percent% int rows (value < ${numRows * percent / 100})",
           s"value < ${numRows * percent / 100}",
           selectExpr
         )
       }
   
       Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
         filterPushDownBenchmark(
           numRows,
           s"Select all int rows ($whereExpr)",
           whereExpr,
           selectExpr)
       }
     }
   
     private def runStringBenchmark(
         numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
       Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
         .foreach { whereExpr =>
           val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
           filterPushDownBenchmark(numRows, title, whereExpr)
         }
   
       Seq(
         s"value = '$searchValue'",
         s"value <=> '$searchValue'",
         s"'$searchValue' <= value AND value <= '$searchValue'"
       ).foreach { whereExpr =>
         val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
         filterPushDownBenchmark(numRows, title, whereExpr)
       }
   
       val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
   
       Seq("value IS NOT NULL").foreach { whereExpr =>
         filterPushDownBenchmark(
           numRows,
           s"Select all $colType rows ($whereExpr)",
           whereExpr,
           selectExpr)
       }
     }
   
     override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
       runBenchmark("Pushdown for many distinct value case") {
         withTempPath { dir =>
           withTempTable("parquetTable") {
             Seq(true, false).foreach { useStringForValue =>
               prepareTable(dir, numRows, width, useStringForValue)
               if (useStringForValue) {
                 runStringBenchmark(numRows, width, mid, "string")
               } else {
                 runIntBenchmark(numRows, width, mid)
               }
             }
           }
         }
       }
   
       runBenchmark("Pushdown for few distinct value case (use dictionary encoding)") {
         withTempPath { dir =>
           val numDistinctValues = 200
   
           withTempTable("parquetTable") {
             prepareStringDictTable(dir, numRows, numDistinctValues, width)
             runStringBenchmark(numRows, width, numDistinctValues / 2, "distinct string")
           }
         }
       }
   
       runBenchmark("Pushdown benchmark for StringStartsWith") {
         withTempPath { dir =>
           withTempTable("parquetTable") {
             prepareTable(dir, numRows, width, true)
             Seq(
               "value like '10%'",
               "value like '1000%'",
               s"value like '${mid.toString.substring(0, mid.toString.length - 1)}%'"
             ).foreach { whereExpr =>
               val title = s"StringStartsWith filter: ($whereExpr)"
               filterPushDownBenchmark(numRows, title, whereExpr)
             }
           }
         }
       }
   
       runBenchmark(s"Pushdown benchmark for ${DecimalType.simpleString}") {
         withTempPath { dir =>
           Seq(
             s"decimal(${Decimal.MAX_INT_DIGITS}, 2)",
             s"decimal(${Decimal.MAX_LONG_DIGITS}, 2)",
             s"decimal(${DecimalType.MAX_PRECISION}, 2)"
           ).foreach { dt =>
             val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
             val valueCol = if (dt.equalsIgnoreCase(s"decimal(${Decimal.MAX_INT_DIGITS}, 2)")) {
               monotonically_increasing_id() % 9999999
             } else {
               monotonically_increasing_id()
             }
             val df = spark.range(numRows)
               .selectExpr(columns: _*).withColumn("value", valueCol.cast(dt))
             withTempTable("parquetTable") {
               saveAsTable(df, dir)
   
               Seq(s"value = $mid").foreach { whereExpr =>
                 val title = s"Select 1 $dt row ($whereExpr)".replace("value AND value", "value")
                 filterPushDownBenchmark(numRows, title, whereExpr)
               }
   
               val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
               Seq(10, 50, 90).foreach { percent =>
                 filterPushDownBenchmark(
                   numRows,
                   s"Select $percent% $dt rows (value < ${numRows * percent / 100})",
                   s"value < ${numRows * percent / 100}",
                   selectExpr
                 )
               }
             }
           }
         }
       }
   
       runBenchmark("Pushdown benchmark for InSet -> InFilters") {
         withTempPath { dir =>
           withTempTable("parquetTable") {
             prepareTable(dir, numRows, width, false)
             Seq(5, 10, 50, 100).foreach { count =>
               Seq(10, 50, 90).foreach { distribution =>
                 val filter =
                   Range(0, count).map(r => scala.util.Random.nextInt(numRows * distribution / 100))
                 val whereExpr = s"value in(${filter.mkString(",")})"
                 val title = s"InSet -> InFilters (values count: $count, distribution: $distribution)"
                 filterPushDownBenchmark(numRows, title, whereExpr)
               }
             }
           }
         }
       }
   
       runBenchmark(s"Pushdown benchmark for ${ByteType.simpleString}") {
         withTempPath { dir =>
           val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
           val df = spark.range(numRows).selectExpr(columns: _*)
             .withColumn("value", (monotonically_increasing_id() % Byte.MaxValue).cast(ByteType))
             .orderBy("value")
           withTempTable("parquetTable") {
             saveAsTable(df, dir)
   
             Seq(s"value = CAST(${Byte.MaxValue / 2} AS ${ByteType.simpleString})")
               .foreach { whereExpr =>
                 val title = s"Select 1 ${ByteType.simpleString} row ($whereExpr)"
                   .replace("value AND value", "value")
                 filterPushDownBenchmark(numRows, title, whereExpr)
               }
   
             val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
             Seq(10, 50, 90).foreach { percent =>
               filterPushDownBenchmark(
                 numRows,
                 s"Select $percent% ${ByteType.simpleString} rows " +
                   s"(value < CAST(${Byte.MaxValue * percent / 100} AS ${ByteType.simpleString}))",
                 s"value < CAST(${Byte.MaxValue * percent / 100} AS ${ByteType.simpleString})",
                 selectExpr
               )
             }
           }
         }
       }
   
       runBenchmark(s"Pushdown benchmark for Timestamp") {
         withTempPath { dir =>
           withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED.key -> true.toString) {
             ParquetOutputTimestampType.values.toSeq.map(_.toString).foreach { fileType =>
               withSQLConf(SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key -> fileType) {
                 val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
                 val df = spark.range(numRows).selectExpr(columns: _*)
                   .withColumn("value", timestamp_seconds(monotonically_increasing_id()))
                 withTempTable("parquetTable") {
                   saveAsTable(df, dir)
   
                   Seq(s"value = timestamp_seconds($mid)").foreach { whereExpr =>
                     val title = s"Select 1 timestamp stored as $fileType row ($whereExpr)"
                       .replace("value AND value", "value")
                     filterPushDownBenchmark(numRows, title, whereExpr)
                   }
   
                   val selectExpr = (1 to width)
                     .map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
                   Seq(10, 50, 90).foreach { percent =>
                     filterPushDownBenchmark(
                       numRows,
                       s"Select $percent% timestamp stored as $fileType rows " +
                         s"(value < timestamp_seconds(${numRows * percent / 100}))",
                       s"value < timestamp_seconds(${numRows * percent / 100})",
                       selectExpr
                     )
                   }
                 }
               }
             }
           }
         }
       }
   
       runBenchmark(s"Pushdown benchmark with many filters") {
         val numRows = 1
         val width = 500
   
         withTempPath { dir =>
           val columns = (1 to width).map(i => s"id c$i")
           val df = spark.range(1).selectExpr(columns: _*)
           withTempTable("parquetTable") {
             saveAsTable(df, dir)
             Seq(1, 250, 500).foreach { numFilter =>
               val whereExpr = (1 to numFilter).map(i => s"c$i = 0").mkString(" and ")
               // Note: InferFiltersFromConstraints will add more filters to this given filters
               filterPushDownBenchmark(numRows, s"Select 1 row with $numFilter filters", whereExpr)
             }
           }
         }
       }
     }
   }
   
   ```
   Parquet 1.10.1:
   ```
   [info] 18:42:20.840 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [info] Running benchmark: Select 0 string row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43822 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 20 iterations, 2066 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 string row (value IS NULL):      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8669           8765          94          1.8         551.1       1.0X
   [info] Parquet Vectorized (Pushdown)                        87            103          10        180.0           5.6      99.2X
   [info] Running benchmark: Select 0 string row ('7864320' < value < '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44140 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4492 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 string row ('7864320' < value < '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                            8729           8828          88          1.8         555.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                  888            898          12         17.7          56.5       9.8X
   [info] Running benchmark: Select 1 string row (value = '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43788 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4415 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row (value = '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8679           8758          69          1.8         551.8       1.0X
   [info] Parquet Vectorized (Pushdown)                       868            883          13         18.1          55.2      10.0X
   [info] Running benchmark: Select 1 string row (value <=> '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43544 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4352 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row (value <=> '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  8648           8709          54          1.8         549.8       1.0X
   [info] Parquet Vectorized (Pushdown)                        861            870           8         18.3          54.7      10.0X
   [info] Running benchmark: Select 1 string row ('7864320' <= value <= '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43898 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4415 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row ('7864320' <= value <= '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              8711           8780          94          1.8         553.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                    870            883           8         18.1          55.3      10.0X
   [info] Running benchmark: Select all string rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 85779 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 85130 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all string rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  17006          17156         139          0.9        1081.2       1.0X
   [info] Parquet Vectorized (Pushdown)                       16922          17026         112          0.9        1075.9       1.0X
   [info] Running benchmark: Select 0 int row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41677 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 26 iterations, 2042 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 int row (value IS NULL):         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8277           8336          58          1.9         526.2       1.0X
   [info] Parquet Vectorized (Pushdown)                        74             79           5        213.9           4.7     112.5X
   [info] Running benchmark: Select 0 int row (7864320 < value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41824 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4201 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 int row (7864320 < value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8274           8365          82          1.9         526.1       1.0X
   [info] Parquet Vectorized (Pushdown)                           813            840          18         19.3          51.7      10.2X
   [info] Running benchmark: Select 1 int row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41763 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4392 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (value = 7864320):       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8218           8353          90          1.9         522.5       1.0X
   [info] Parquet Vectorized (Pushdown)                       857            879          18         18.4          54.5       9.6X
   [info] Running benchmark: Select 1 int row (value <=> 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41937 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4133 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (value <=> 7864320):     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8275           8387         112          1.9         526.1       1.0X
   [info] Parquet Vectorized (Pushdown)                       816            827          11         19.3          51.9      10.1X
   [info] Running benchmark: Select 1 int row (7864320 <= value <= 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41648 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4247 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (7864320 <= value <= 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       8299           8330          26          1.9         527.6       1.0X
   [info] Parquet Vectorized (Pushdown)                             818            849          22         19.2          52.0      10.1X
   [info] Running benchmark: Select 1 int row (7864319 < value < 7864321)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41604 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4159 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (7864319 < value < 7864321):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8228           8321          74          1.9         523.1       1.0X
   [info] Parquet Vectorized (Pushdown)                           814            832          11         19.3          51.7      10.1X
   [info] Running benchmark: Select 10% int rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 45888 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 12000 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% int rows (value < 1572864):    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 9131           9178          41          1.7         580.6       1.0X
   [info] Parquet Vectorized (Pushdown)                      2377           2400          17          6.6         151.1       3.8X
   [info] Running benchmark: Select 50% int rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 61875 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 42681 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% int rows (value < 7864320):    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                12166          12375         226          1.3         773.5       1.0X
   [info] Parquet Vectorized (Pushdown)                      8408           8536         106          1.9         534.6       1.4X
   [info] Running benchmark: Select 90% int rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 76034 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 72997 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% int rows (value < 14155776):   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15098          15207          74          1.0         959.9       1.0X
   [info] Parquet Vectorized (Pushdown)                     14390          14599         127          1.1         914.9       1.0X
   [info] Running benchmark: Select all int rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 80290 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 81014 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15749          16058         199          1.0        1001.3       1.0X
   [info] Parquet Vectorized (Pushdown)                     16147          16203          69          1.0        1026.6       1.0X
   [info] Running benchmark: Select all int rows (value > -1)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 81133 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 81411 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value > -1):         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                16103          16227         111          1.0        1023.8       1.0X
   [info] Parquet Vectorized (Pushdown)                     16125          16282         142          1.0        1025.2       1.0X
   [info] Running benchmark: Select all int rows (value != -1)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 81013 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 80343 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value != -1):        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                16073          16203         117          1.0        1021.9       1.0X
   [info] Parquet Vectorized (Pushdown)                     15942          16069          84          1.0        1013.6       1.0X
   [info] Running benchmark: Select 0 distinct string row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40258 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 31 iterations, 2054 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 distinct string row (value IS NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     7953           8052          84          2.0         505.7       1.0X
   [info] Parquet Vectorized (Pushdown)                            62             66           6        253.5           3.9     128.2X
   [info] Running benchmark: Select 0 distinct string row ('100' < value < '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40734 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4731 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 distinct string row ('100' < value < '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                             8026           8147          73          2.0         510.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                   939            946           6         16.8          59.7       8.5X
   [info] Running benchmark: Select 1 distinct string row (value = '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40674 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4874 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row (value = '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8034           8135          87          2.0         510.8       1.0X
   [info] Parquet Vectorized (Pushdown)                           957            975          27         16.4          60.9       8.4X
   [info] Running benchmark: Select 1 distinct string row (value <=> '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40781 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4698 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row (value <=> '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       8112           8156          39          1.9         515.7       1.0X
   [info] Parquet Vectorized (Pushdown)                             926            940           9         17.0          58.9       8.8X
   [info] Running benchmark: Select 1 distinct string row ('100' <= value <= '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41005 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5174 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row ('100' <= value <= '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8151           8201          42          1.9         518.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                    1014           1035          32         15.5          64.5       8.0X
   [info] Running benchmark: Select all distinct string rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 89835 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 90269 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all distinct string rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                           17886          17967          56          0.9        1137.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                17979          18054         100          0.9        1143.0       1.0X
   [info] Running benchmark: StringStartsWith filter: (value like '10%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 46786 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5455 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '10%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                    9191           9357         168          1.7         584.4       1.0X
   [info] Parquet Vectorized (Pushdown)                         1075           1091          11         14.6          68.4       8.5X
   [info] Running benchmark: StringStartsWith filter: (value like '1000%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 45468 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4483 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '1000%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      9017           9094         116          1.7         573.3       1.0X
   [info] Parquet Vectorized (Pushdown)                            888            897           7         17.7          56.5      10.1X
   [info] Running benchmark: StringStartsWith filter: (value like '786432%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 45429 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4428 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '786432%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        9037           9086          55          1.7         574.6       1.0X
   [info] Parquet Vectorized (Pushdown)                              864            886          17         18.2          55.0      10.5X
   [info] Running benchmark: Select 1 decimal(9, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 17614 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5788 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(9, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     3488           3523          27          4.5         221.8       1.0X
   [info] Parquet Vectorized (Pushdown)                          1148           1158          11         13.7          73.0       3.0X
   [info] Running benchmark: Select 10% decimal(9, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 25815 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 25522 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(9, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        5117           5163          63          3.1         325.3       1.0X
   [info] Parquet Vectorized (Pushdown)                             5044           5104          55          3.1         320.7       1.0X
   [info] Running benchmark: Select 50% decimal(9, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 52939 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 52691 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(9, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       10443          10588         116          1.5         663.9       1.0X
   [info] Parquet Vectorized (Pushdown)                            10388          10538         173          1.5         660.5       1.0X
   [info] Running benchmark: Select 90% decimal(9, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 58989 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 59164 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(9, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        11676          11798          96          1.3         742.3       1.0X
   [info] Parquet Vectorized (Pushdown)                             11718          11833         112          1.3         745.0       1.0X
   [info] Running benchmark: Select 1 decimal(18, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 18284 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5992 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(18, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      3583           3657          49          4.4         227.8       1.0X
   [info] Parquet Vectorized (Pushdown)                           1187           1198           7         13.2          75.5       3.0X
   [info] Running benchmark: Select 10% decimal(18, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 23432 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 10519 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(18, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         4603           4686          77          3.4         292.7       1.0X
   [info] Parquet Vectorized (Pushdown)                              2058           2104          92          7.6         130.8       2.2X
   [info] Running benchmark: Select 50% decimal(18, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 39380 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 32688 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(18, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         7805           7876          86          2.0         496.2       1.0X
   [info] Parquet Vectorized (Pushdown)                              6475           6538          68          2.4         411.6       1.2X
   [info] Running benchmark: Select 90% decimal(18, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 55690 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 54683 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(18, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         11000          11138         112          1.4         699.3       1.0X
   [info] Parquet Vectorized (Pushdown)                              10764          10937         125          1.5         684.4       1.0X
   [info] Running benchmark: Select 1 decimal(38, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 29479 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9146 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(38, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      5655           5896         242          2.8         359.6       1.0X
   [info] Parquet Vectorized (Pushdown)                           1808           1829          18          8.7         115.0       3.1X
   [info] Running benchmark: Select 10% decimal(38, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 34809 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 14529 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(38, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         6878           6962          62          2.3         437.3       1.0X
   [info] Parquet Vectorized (Pushdown)                              2861           2906          69          5.5         181.9       2.4X
   [info] Running benchmark: Select 50% decimal(38, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 55777 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 44400 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(38, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        10967          11155         151          1.4         697.3       1.0X
   [info] Parquet Vectorized (Pushdown)                              8769           8880         111          1.8         557.5       1.3X
   [info] Running benchmark: Select 90% decimal(38, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 75507 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 73697 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(38, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         14916          15101         115          1.1         948.3       1.0X
   [info] Parquet Vectorized (Pushdown)                              14623          14740         103          1.1         929.7       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42201 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4194 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8280           8440          93          1.9         526.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                     813            839          19         19.4          51.7      10.2X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41743 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 15602 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8263           8349          87          1.9         525.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                    3108           3120          10          5.1         197.6       2.7X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42229 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 15575 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8356           8446          89          1.9         531.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                    3062           3115          64          5.1         194.7       2.7X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42041 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 8012 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8317           8408          77          1.9         528.7       1.0X
   [info] Parquet Vectorized (Pushdown)                                     1577           1603          21         10.0         100.2       5.3X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41870 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 15558 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8321           8374          43          1.9         529.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                     3069           3112          40          5.1         195.1       2.7X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42102 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 19401 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8382           8420          46          1.9         532.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                     3865           3880          17          4.1         245.7       2.2X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43390 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 44089 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8594           8678          85          1.8         546.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8710           8818         141          1.8         553.7       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43434 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43449 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8643           8687          32          1.8         549.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8537           8690         142          1.8         542.8       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43472 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43329 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8633           8695          65          1.8         548.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8635           8666          29          1.8         549.0       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42939 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43868 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8486           8588          81          1.9         539.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8663           8774         175          1.8         550.8       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43116 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43589 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8566           8623          46          1.8         544.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8646           8718          84          1.8         549.7       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43544 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43485 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8639           8709          56          1.8         549.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8638           8697          53          1.8         549.2       1.0X
   [info] Running benchmark: Select 1 tinyint row (value = CAST(63 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 19550 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 6223 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 tinyint row (value = CAST(63 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                           3749           3910         147          4.2         238.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                1184           1245          44         13.3          75.3       3.2X
   [info] Running benchmark: Select 10% tinyint rows (value < CAST(12 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 23026 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9723 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% tinyint rows (value < CAST(12 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              4552           4605          46          3.5         289.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                   1906           1945          52          8.3         121.2       2.4X
   [info] Running benchmark: Select 50% tinyint rows (value < CAST(63 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 38202 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 30731 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% tinyint rows (value < CAST(63 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              7511           7641         103          2.1         477.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                   6108           6146          49          2.6         388.3       1.2X
   [info] Running benchmark: Select 90% tinyint rows (value < CAST(114 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 54038 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 53985 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% tinyint rows (value < CAST(114 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              10695          10808          92          1.5         680.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                   10648          10797         137          1.5         677.0       1.0X
   [info] Running benchmark: Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 21389 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 20900 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                    4173           4278          82          3.8         265.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                                         4130           4180          37          3.8         262.6       1.0X
   [info] Running benchmark: Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 25237 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 25245 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       4985           5047          67          3.2         316.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            4968           5049          73          3.2         315.9       1.0X
   [info] Running benchmark: Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40629 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 40929 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       8005           8126         109          2.0         509.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            8087           8186          67          1.9         514.2       1.0X
   [info] Running benchmark: Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 55942 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 56599 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       10905          11189         178          1.4         693.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            11054          11320         203          1.4         702.8       1.0X
   [info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 17659 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5428 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                               3475           3532          43          4.5         220.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                    1072           1086           9         14.7          68.2       3.2X
   [info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 21779 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9752 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  4344           4356           9          3.6         276.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       1874           1950          87          8.4         119.2       2.3X
   [info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 37830 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 30583 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  7478           7566         120          2.1         475.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       6034           6117          97          2.6         383.6       1.2X
   [info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 52857 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 53101 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  10443          10571         129          1.5         664.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       10491          10620         215          1.5         667.0       1.0X
   [info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 18656 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5916 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                               3718           3731          24          4.2         236.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                    1157           1183          17         13.6          73.6       3.2X
   [info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 22909 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 10248 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  4568           4582          16          3.4         290.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       2005           2050          51          7.8         127.5       2.3X
   [info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 38751 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 31321 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  7651           7750          89          2.1         486.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       6198           6264          94          2.5         394.1       1.2X
   [info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 53723 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 53353 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  10563          10745         117          1.5         671.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       10542          10671         147          1.5         670.2       1.0X
   [info] 20:25:52.074 WARN org.apache.spark.sql.catalyst.util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
   [info] Running benchmark: Select 1 row with 1 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 12 iterations, 2085 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 13 iterations, 2161 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 1 filters:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  163            174          13          0.0   162534801.0       1.0X
   [info] Parquet Vectorized (Pushdown)                       161            166           5          0.0   161189323.0       1.0X
   [info] Running benchmark: Select 1 row with 250 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 4092 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4668 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 250 filters:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  806            819          19          0.0   806155381.0       1.0X
   [info] Parquet Vectorized (Pushdown)                       910            934          17          0.0   909761809.0       0.9X
   [info] Running benchmark: Select 1 row with 500 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 15143 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 17252 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 500 filters:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 2994           3029          23          0.0  2993624958.0       1.0X
   [info] Parquet Vectorized (Pushdown)                      3438           3451          12          0.0  3437503212.0       0.9X
   [success] Total time: 6320 s (01:45:20), completed Jan 19, 2021 8:26:57 PM
   ```
   
   Parquet 1.11.1:
   ```
   [info] 22:44:02.552 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [info] Running benchmark: Select 0 string row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44098 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 20 iterations, 2028 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 string row (value IS NULL):      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8753           8820          84          1.8         556.5       1.0X
   [info] Parquet Vectorized (Pushdown)                        89            101          10        177.7           5.6      98.9X
   [info] Running benchmark: Select 0 string row ('7864320' < value < '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44149 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4627 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 string row ('7864320' < value < '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                            8774           8830          47          1.8         557.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                  906            926          15         17.4          57.6       9.7X
   [info] Running benchmark: Select 1 string row (value = '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44520 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4633 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row (value = '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8780           8904          82          1.8         558.2       1.0X
   [info] Parquet Vectorized (Pushdown)                       901            927          22         17.5          57.3       9.7X
   [info] Running benchmark: Select 1 string row (value <=> '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44581 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4554 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row (value <=> '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  8899           8916          10          1.8         565.8       1.0X
   [info] Parquet Vectorized (Pushdown)                        897            911          15         17.5          57.1       9.9X
   [info] Running benchmark: Select 1 string row ('7864320' <= value <= '7864320')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44143 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4487 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 string row ('7864320' <= value <= '7864320'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              8693           8829          96          1.8         552.7       1.0X
   [info] Parquet Vectorized (Pushdown)                                    885            898          12         17.8          56.3       9.8X
   [info] Running benchmark: Select all string rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 85771 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 85841 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all string rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  17097          17154          60          0.9        1087.0       1.0X
   [info] Parquet Vectorized (Pushdown)                       17017          17168         138          0.9        1081.9       1.0X
   [info] Running benchmark: Select 0 int row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41273 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 27 iterations, 2061 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 int row (value IS NULL):         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8239           8255          12          1.9         523.8       1.0X
   [info] Parquet Vectorized (Pushdown)                        70             76           5        224.5           4.5     117.6X
   [info] Running benchmark: Select 0 int row (7864320 < value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41954 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4106 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 int row (7864320 < value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8222           8391         122          1.9         522.7       1.0X
   [info] Parquet Vectorized (Pushdown)                           808            821          11         19.5          51.4      10.2X
   [info] Running benchmark: Select 1 int row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41815 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4120 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (value = 7864320):       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8317           8363          61          1.9         528.8       1.0X
   [info] Parquet Vectorized (Pushdown)                       807            824          15         19.5          51.3      10.3X
   [info] Running benchmark: Select 1 int row (value <=> 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42163 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4088 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (value <=> 7864320):     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 8351           8433          67          1.9         530.9       1.0X
   [info] Parquet Vectorized (Pushdown)                       804            818          23         19.6          51.1      10.4X
   [info] Running benchmark: Select 1 int row (7864320 <= value <= 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42349 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4223 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (7864320 <= value <= 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       8389           8470          63          1.9         533.4       1.0X
   [info] Parquet Vectorized (Pushdown)                             835            845          10         18.8          53.1      10.0X
   [info] Running benchmark: Select 1 int row (7864319 < value < 7864321)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41947 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4084 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 int row (7864319 < value < 7864321):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8347           8390          54          1.9         530.7       1.0X
   [info] Parquet Vectorized (Pushdown)                           795            817          19         19.8          50.5      10.5X
   [info] Running benchmark: Select 10% int rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 46948 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 12149 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% int rows (value < 1572864):    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 9248           9390         101          1.7         588.0       1.0X
   [info] Parquet Vectorized (Pushdown)                      2415           2430          15          6.5         153.5       3.8X
   [info] Running benchmark: Select 50% int rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 60395 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 41469 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% int rows (value < 7864320):    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                11943          12079         187          1.3         759.3       1.0X
   [info] Parquet Vectorized (Pushdown)                      8192           8294          63          1.9         520.8       1.5X
   [info] Running benchmark: Select 90% int rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 75730 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 72593 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% int rows (value < 14155776):   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15026          15146         119          1.0         955.3       1.0X
   [info] Parquet Vectorized (Pushdown)                     14315          14519         212          1.1         910.1       1.0X
   [info] Running benchmark: Select all int rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 79340 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 79510 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15715          15868         128          1.0         999.2       1.0X
   [info] Parquet Vectorized (Pushdown)                     15791          15902          85          1.0        1004.0       1.0X
   [info] Running benchmark: Select all int rows (value > -1)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 79442 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 78576 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value > -1):         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15760          15889         163          1.0        1002.0       1.0X
   [info] Parquet Vectorized (Pushdown)                     15679          15715          32          1.0         996.8       1.0X
   [info] Running benchmark: Select all int rows (value != -1)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 79189 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 80052 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all int rows (value != -1):        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                15669          15838         118          1.0         996.2       1.0X
   [info] Parquet Vectorized (Pushdown)                     15710          16010         248          1.0         998.8       1.0X
   [info] Running benchmark: Select 0 distinct string row (value IS NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 39957 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 30 iterations, 2038 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 distinct string row (value IS NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     7899           7991          63          2.0         502.2       1.0X
   [info] Parquet Vectorized (Pushdown)                            62             68           6        255.1           3.9     128.1X
   [info] Running benchmark: Select 0 distinct string row ('100' < value < '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40549 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4740 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 0 distinct string row ('100' < value < '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                             8009           8110          82          2.0         509.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                   939            948           7         16.8          59.7       8.5X
   [info] Running benchmark: Select 1 distinct string row (value = '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40421 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4797 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row (value = '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     8038           8084          40          2.0         511.1       1.0X
   [info] Parquet Vectorized (Pushdown)                           949            959           9         16.6          60.3       8.5X
   [info] Running benchmark: Select 1 distinct string row (value <=> '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41089 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4819 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row (value <=> '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       8025           8218         150          2.0         510.2       1.0X
   [info] Parquet Vectorized (Pushdown)                             944            964          16         16.7          60.0       8.5X
   [info] Running benchmark: Select 1 distinct string row ('100' <= value <= '100')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40887 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4829 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 distinct string row ('100' <= value <= '100'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8124           8177          73          1.9         516.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                     952            966          12         16.5          60.5       8.5X
   [info] Running benchmark: Select all distinct string rows (value IS NOT NULL)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 87519 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 87496 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select all distinct string rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                           17196          17504         206          0.9        1093.3       1.0X
   [info] Parquet Vectorized (Pushdown)                                17342          17499         148          0.9        1102.6       1.0X
   [info] Running benchmark: StringStartsWith filter: (value like '10%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 45539 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5401 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '10%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                    9037           9108          62          1.7         574.5       1.0X
   [info] Parquet Vectorized (Pushdown)                         1063           1080          14         14.8          67.6       8.5X
   [info] Running benchmark: StringStartsWith filter: (value like '1000%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44501 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4443 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '1000%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      8807           8900          78          1.8         560.0       1.0X
   [info] Parquet Vectorized (Pushdown)                            865            889          20         18.2          55.0      10.2X
   [info] Running benchmark: StringStartsWith filter: (value like '786432%')
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 44776 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4388 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] StringStartsWith filter: (value like '786432%'):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        8797           8955         109          1.8         559.3       1.0X
   [info] Parquet Vectorized (Pushdown)                              854            878          20         18.4          54.3      10.3X
   [info] Running benchmark: Select 1 decimal(9, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 17622 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5921 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(9, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                     3475           3525          57          4.5         220.9       1.0X
   [info] Parquet Vectorized (Pushdown)                          1166           1184          19         13.5          74.1       3.0X
   [info] Running benchmark: Select 10% decimal(9, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 26543 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 25522 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(9, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        5075           5309         154          3.1         322.7       1.0X
   [info] Parquet Vectorized (Pushdown)                             4943           5105         121          3.2         314.2       1.0X
   [info] Running benchmark: Select 50% decimal(9, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 51448 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 52535 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(9, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                       10168          10290          94          1.5         646.5       1.0X
   [info] Parquet Vectorized (Pushdown)                            10386          10507          96          1.5         660.3       1.0X
   [info] Running benchmark: Select 90% decimal(9, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 59845 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 59254 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(9, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        11815          11969         240          1.3         751.2       1.0X
   [info] Parquet Vectorized (Pushdown)                             11655          11851         209          1.3         741.0       1.0X
   [info] Running benchmark: Select 1 decimal(18, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 18282 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 6164 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(18, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      3597           3657          59          4.4         228.7       1.0X
   [info] Parquet Vectorized (Pushdown)                           1219           1233          14         12.9          77.5       3.0X
   [info] Running benchmark: Select 10% decimal(18, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 22746 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 10375 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(18, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         4484           4549          67          3.5         285.1       1.0X
   [info] Parquet Vectorized (Pushdown)                              2023           2075          57          7.8         128.6       2.2X
   [info] Running benchmark: Select 50% decimal(18, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 39274 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 33687 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(18, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         7792           7855          48          2.0         495.4       1.0X
   [info] Parquet Vectorized (Pushdown)                              6498           6738         150          2.4         413.2       1.2X
   [info] Running benchmark: Select 90% decimal(18, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 56243 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 55540 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(18, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         11127          11249         167          1.4         707.5       1.0X
   [info] Parquet Vectorized (Pushdown)                              10841          11108         225          1.5         689.3       1.0X
   [info] Running benchmark: Select 1 decimal(38, 2) row (value = 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 29521 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9333 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 decimal(38, 2) row (value = 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                      5766           5904          94          2.7         366.6       1.0X
   [info] Parquet Vectorized (Pushdown)                           1836           1867          53          8.6         116.8       3.1X
   [info] Running benchmark: Select 10% decimal(38, 2) rows (value < 1572864)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 34386 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 14350 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% decimal(38, 2) rows (value < 1572864):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         6746           6877         122          2.3         428.9       1.0X
   [info] Parquet Vectorized (Pushdown)                              2807           2870          75          5.6         178.5       2.4X
   [info] Running benchmark: Select 50% decimal(38, 2) rows (value < 7864320)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 54192 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43783 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% decimal(38, 2) rows (value < 7864320):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                        10681          10839         142          1.5         679.1       1.0X
   [info] Parquet Vectorized (Pushdown)                              8550           8757         162          1.8         543.6       1.2X
   [info] Running benchmark: Select 90% decimal(38, 2) rows (value < 14155776)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 74674 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 72033 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% decimal(38, 2) rows (value < 14155776):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                         14675          14935         295          1.1         933.0       1.0X
   [info] Parquet Vectorized (Pushdown)                              14171          14407         158          1.1         901.0       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41729 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4213 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8288           8346          44          1.9         526.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                     838            843           6         18.8          53.3       9.9X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41750 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 15555 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8273           8350          55          1.9         526.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                    3101           3111          14          5.1         197.1       2.7X
   [info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41873 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 11725 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 5, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                               8303           8375          94          1.9         527.9       1.0X
   [info] Parquet Vectorized (Pushdown)                                    2307           2345          24          6.8         146.7       3.6X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41760 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 8029 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8307           8352          47          1.9         528.1       1.0X
   [info] Parquet Vectorized (Pushdown)                                     1588           1606          15          9.9         100.9       5.2X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41862 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 19294 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8258           8373          77          1.9         525.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                     3814           3859          32          4.1         242.5       2.2X
   [info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 41883 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 27256 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 10, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8270           8377          74          1.9         525.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                     5332           5451         165          3.0         339.0       1.6X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43408 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43478 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8632           8682          35          1.8         548.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8647           8696          48          1.8         549.8       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43469 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43325 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8653           8694          28          1.8         550.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8627           8665          39          1.8         548.5       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43451 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 44043 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 50, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                8622           8690          81          1.8         548.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                     8597           8809         208          1.8         546.6       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 10)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43363 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43095 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8579           8673          98          1.8         545.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8566           8619          39          1.8         544.6       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 50)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 43184 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43077 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8582           8637          61          1.8         545.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8530           8615          67          1.8         542.3       1.0X
   [info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 90)
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 42947 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 43033 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] InSet -> InFilters (values count: 100, distribution: 90):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                 8487           8590          98          1.9         539.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                      8463           8607         220          1.9         538.1       1.0X
   [info] Running benchmark: Select 1 tinyint row (value = CAST(63 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 19742 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5910 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 tinyint row (value = CAST(63 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                           3891           3949          70          4.0         247.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                1174           1182          16         13.4          74.6       3.3X
   [info] Running benchmark: Select 10% tinyint rows (value < CAST(12 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 23622 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9787 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% tinyint rows (value < CAST(12 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              4615           4724         100          3.4         293.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                   1924           1958          64          8.2         122.3       2.4X
   [info] Running benchmark: Select 50% tinyint rows (value < CAST(63 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 38379 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 30411 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% tinyint rows (value < CAST(63 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              7557           7676         108          2.1         480.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                   6011           6082          60          2.6         382.2       1.3X
   [info] Running benchmark: Select 90% tinyint rows (value < CAST(114 AS tinyint))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 54810 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 54362 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% tinyint rows (value < CAST(114 AS tinyint)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                              10670          10962         361          1.5         678.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                   10693          10872         224          1.5         679.8       1.0X
   [info] Running benchmark: Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 21078 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 21416 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -----------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                    4156           4216          67          3.8         264.2       1.0X
   [info] Parquet Vectorized (Pushdown)                                                         4151           4283          89          3.8         263.9       1.0X
   [info] Running benchmark: Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 25197 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 25234 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       4931           5039          71          3.2         313.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            4923           5047          73          3.2         313.0       1.0X
   [info] Running benchmark: Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 40851 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 40816 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       7972           8170         127          2.0         506.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            8056           8163          92          2.0         512.2       1.0X
   [info] Running benchmark: Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 56489 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 55908 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                       11111          11298         238          1.4         706.4       1.0X
   [info] Parquet Vectorized (Pushdown)                                                            11086          11182          66          1.4         704.8       1.0X
   [info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 17925 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 5612 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                               3504           3585          69          4.5         222.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                    1119           1123           4         14.1          71.1       3.1X
   [info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 22303 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 9942 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  4365           4461          84          3.6         277.5       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       1905           1988         101          8.3         121.1       2.3X
   [info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 38138 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 30971 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  7534           7628         101          2.1         479.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       6010           6194         189          2.6         382.1       1.3X
   [info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 54005 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 52469 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  10649          10801         149          1.5         677.0       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       10307          10494         310          1.5         655.3       1.0X
   [info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 18819 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 6081 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                               3738           3764          28          4.2         237.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                    1190           1216          26         13.2          75.7       3.1X
   [info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 23198 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 10525 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  4586           4640          52          3.4         291.6       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       2009           2105          70          7.8         127.7       2.3X
   [info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 39337 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 33023 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  7766           7868          82          2.0         493.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       6404           6605         187          2.5         407.1       1.2X
   [info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776))
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 54512 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 53224 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                                                                  10833          10902          55          1.5         688.8       1.0X
   [info] Parquet Vectorized (Pushdown)                                                                       10499          10645         102          1.5         667.5       1.0X
   [info] 00:27:18.540 WARN org.apache.spark.sql.catalyst.util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
   [info] Running benchmark: Select 1 row with 1 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 12 iterations, 2032 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 12 iterations, 2021 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 1 filters:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  163            169           6          0.0   162707158.0       1.0X
   [info] Parquet Vectorized (Pushdown)                       162            168           5          0.0   162184547.0       1.0X
   [info] Running benchmark: Select 1 row with 250 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 3930 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 4599 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 250 filters:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                  777            786          15          0.0   776511925.0       1.0X
   [info] Parquet Vectorized (Pushdown)                       903            920          23          0.0   902964783.0       0.9X
   [info] Running benchmark: Select 1 row with 500 filters
   [info]   Running case: Parquet Vectorized
   [info]   Stopped after 5 iterations, 14782 ms
   [info]   Running case: Parquet Vectorized (Pushdown)
   [info]   Stopped after 5 iterations, 16974 ms
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
   [info] Intel Core Processor (Broadwell, IBRS)
   [info] Select 1 row with 500 filters:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 2921           2956          28          0.0  2921416288.0       1.0X
   [info] Parquet Vectorized (Pushdown)                      3383           3395          10          0.0  3382576710.0       0.9X
   [success] Total time: 6276 s (01:44:36), completed Jan 20, 2021 12:28:23 AM
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965


   @h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want that, but currently it looks infeasible technically. Do you think that all infeasible things are low priority?
   > I'm surprised (without criticism!) that this has a seemingly low priority


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799915


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-768790620


   LGTM 
   
   The current PR looks good to me. However, based on the pervious experience, Parquet upgrade always causes various issues. We might revert the upgrade at the last minute. 
   
   @wangyum Could you create a 3.2.0 blocker JIRA? Before the release, we need to double check the unreleased/unresolved JIRAs/PRs of Parquet 1.11 and then decide whether we should upgrade/revert Parquet. At the same time, we should encourage the whole community to do the compatibility and performance tests for their production workloads, including both read and write code paths. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561449141



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       Wow. Then, it's a real bug. Thanks for confirmation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767335521


   **[Test build #134487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306


   -1 (non-binding) Please do not merge this into master, it breaks downstream applications due to mixed Avro 1.8.x vs 1.9.x transitive dependencies.
   
   I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769488412


   Thank you, @gatorsmile and @wangyum !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764181491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38882/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767383036


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39073/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765892788


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38986/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767383036


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39073/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762167906


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134187/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561044576



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
 
     val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
     Seq("", "parquet").foreach { useV1Source =>
-      withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+        ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {

Review comment:
       Disable `parquet.page.write-checksum.enabled`, otherwise:
   ```
   [info] - DataFrame reuse *** FAILED *** (1 second, 802 milliseconds)
   [info]   Decoded objects do not match expected objects:
   [info]   expected: WrappedArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
   [info]   actual:   WrappedArray(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 2)
   [info]   assertnotnull(upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long"))
   [info]   +- upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long")
   [info]      +- getcolumnbyordinal(0, LongType) (QueryTest.scala:68)
   ```
   This issue introduced by [PARQUET-1580](https://issues.apache.org/jira/browse/PARQUET-1580). cc @gszadovszky




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767335521


   **[Test build #134487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763776712


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-715504346


   @heuermh you're right, haven't considered this case before. Even if we shade Avro in Spark we may still have the Avro jars from Hive side which are of even lower version. I _think_ `parquet-avro` 1.10.1 can work with other parquet 1.11.x modules but maybe this is something we don't want to do anyways in order to not confuse Spark users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799915


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764186208


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38884/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763817245


   **[Test build #134282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764172517


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38884/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762064332


   **[Test build #134187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762130255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38772/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078


   @sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x. Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762364158


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765903407


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38986/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765902815


   **[Test build #134400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class BitwiseGet(left: Expression, right: Expression)`
     * `    new RuntimeException(s\"class `$`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767404318


   **[Test build #134487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-722984499


   @sunchao I was revisiting this patch with the source compatible idea of the recent patch we worked for Hive and it seems that Parquet is fully source compatible with Avro 1.8.2-1.11.1 so this upgrade on Spark side should be less of a problem. Only issue is the dependency leaking you mention above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763829863


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134282/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630458539


   @heuermh thanks for the info. @dongjoon-hyun @wangyum any thought on this to move forward?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765907105






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769495501


   Nice work @wangyum and all! is there anything else to be done in order to get the full page skipping feature with column indexes? looking at [PARQUET-1739](https://issues.apache.org/jira/browse/PARQUET-1739) I was under the impression that the vectorized path needs some more work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762367843


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969


   **[Test build #134296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767367295


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39073/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762064332


   **[Test build #134187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561433379



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       So, are you saying that there is no breaking change, @wangyum ?
   @srowen 's question is asking the reason why we need this change, isn't it?
   

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       Could you add some comment about the reason why you disable it? It looks like a workaround to avoid Parquet-side performance regression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063666


   >> Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
   >
   > #30517 is used for testing compatibility.
   
   Thank you, @wangyum!  As #31232 has been merged for Spark 3.2.0, I assume the target for this pull request is also version 3.2.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707592


   **[Test build #134282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762615763


   **[Test build #134211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r562959779



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       This looks dangerous. Also cc @bbraams 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559640730



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       Would this be a possibly breaking change to files written as Parquet? may be a dumb question.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969


   **[Test build #134296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764206843


   **[Test build #134301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763697839


   Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764181491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38882/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764187076


   **[Test build #134296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #26804:
URL: https://github.com/apache/spark/pull/26804


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38883/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       It will change the data order, please seem https://github.com/apache/spark/pull/26804#discussion_r561044576.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663


   @h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it.
   
   Please feel free to open a working PR. Then, the community will welcome.
   
   BTW, we are in Apache Spark community. For the other community issues, please ping them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gszadovszky commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561694102



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
 
     val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
     Seq("", "parquet").foreach { useV1Source =>
-      withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+        ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {

Review comment:
       @wangyum, I've checked the code change of PARQUET-1580 (again) and still don't understand why it would cause such an issue. By disabling the CRC write you only achieve to not to write an optional field in the page headers. It should not impact any kind of ordering. If it really does it means that this ordering relies on some parameters that it shouldn't. It also means that any other potential change in the file metadata might impact this ordering.
   Maybe I'm overlooking something in our code base so any comment is welcomed but if not I would suggest revisiting these unit tests.
   
   Meanwhile, I am not experienced in Spark code so if you are fine with this workaround in a unit test I am not against it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764403535


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134301/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767355969


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39073/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764312009


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762088985


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38772/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306


   -1 (non-binding) Please do not merge this into master, it breaks downstream applications due to mixed Avro 1.8.x vs 1.9.x transitive dependencies.
   
   I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things, see pull request https://github.com/apache/spark/pull/27609 which was closed without merging).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764115800


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38883/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663


   @h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it.
   
   Please feel free to open a working PR. Then, the community will welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559357563



##########
File path: pom.xml
##########
@@ -2318,6 +2318,10 @@
             <groupId>commons-pool</groupId>
             <artifactId>commons-pool</artifactId>
           </exclusion>
+          <exclusion>
+            <groupId>javax.annotation</groupId>
+            <artifactId>javax.annotation-api</artifactId>
+          </exclusion>

Review comment:
       We do not need this, please see PARQUET-1497 for more details.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630366425


   What's the status of this, if I may ask?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       It will change the data order, please see https://github.com/apache/spark/pull/26804#discussion_r561044576.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762338175


   **[Test build #134204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762043802


   Thank you for reopening this, @wangyum .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762167906


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134187/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762367843


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gszadovszky commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561694102



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
 
     val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
     Seq("", "parquet").foreach { useV1Source =>
-      withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+        ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {

Review comment:
       @wangyum, I've checked the code change of PARQUET-1580 (again) and still don't understand why it would cause such an issue. By disabling the CRC write you only achieve to not to write an optional field in the page headers. It should not impact any kind of ordering. If it really does it means that this ordering relies on some parameters that it shouldn't. It also means that any other potential change in the file metadata might impact this ordering.
   Maybe I'm overlooking something in our code base so any comment is welcomed but if not I would suggest revisiting these unit tests.
   
   Meanwhile, I am not experienced in Spark code so if you are fine with this workaround in a unit test I am not against it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764334636


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630439172


   The benchmark from @wangyum shows no regression from upgrading the Parquet version. Since Spark 3.0 will be almost released, we should consider to merge this into master so people can do more testing and have it as part of Spark 3.1.
   
   I'll merge it into master once a new build is finished.
   
   Thanks,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763829863


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134282/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765907105






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630471244


   @iemejia 's Avro PR (#27609) didn't pass Apache Spark UTs. And, according to his report, this Parquet PR seems to be blocked by Avro dependency upgrade. If we have a clean PR for Avro to pass all UTs (including Hive 1.2/2.3 profile), we may restart to review it.
   
   BTW, FYI, there is no Apache Hive release supporting Avro 1.9.x.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764228690


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965


   @h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want new Parquet, but currently it looks infeasible technically. Do you think that all infeasible things are low priority?
   > I'm surprised (without criticism!) that this has a seemingly low priority


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38883/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764126022


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38883/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561926496



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
 
     val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
     Seq("", "parquet").foreach { useV1Source =>
-      withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+        ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {

Review comment:
       Thank you @gszadovszky The size is different if enable the CRC write: 
   ```
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-0ae44ddf-40bb-4ba5-84af-ec8cec037847-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-166cdfbf-b19d-4d55-b4ea-fbad6bcac9df-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  462 Jan 21 22:20 part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-3c52b355-290a-4dd4-aad3-4bb2960ba3b8-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-4486173e-d650-4548-8da4-b95ae0305d8c-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-4c3786f4-2702-4f58-9604-c3deed68bc86-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-71bf8d51-95b0-43a8-969b-c28630f90066-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-78776231-370d-45c6-8520-67b94c33c697-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-a2a811aa-a495-4439-9daf-8c4b2cb258d5-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-bf745883-0b3a-4383-8669-7464833bfea8-c000.snappy.parquet
   -rw-r--r--  1 yumwang  wheel  463 Jan 21 22:20 part-00001-d3a65d34-cd1a-434d-a86c-8ee0203b3bac-c000.snappy.parquet
   yumwang@LM-SHC-16508156 1611238822602 % parquet-tools cat part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
   a = 2
   ```
   and we will order the file by size:
   https://github.com/apache/spark/blob/8ed23ed499ec7745a8e9bdc4c4fb3200fdb6c3c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L609
   
   Not sure if it caused by int overflow:
   https://github.com/apache/parquet-mr/pull/647#discussion_r561914480




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-713879155


   @heuermh could you please clarify how version change in parquet-avro will affect downstream apps? it's just a test dependency and shouldn't leak avro right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769481762






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630945817


   > @dongjoon-hyun: Please feel free to open a working PR. Then, the community will welcome.
   
   Sorry if my message came across as demanding. I'm not deeply involved in the community here (yet?), and neither in the respective code bases, but if someone as involved as @iemejia is stuck, I have little hope to make an impact in the current situation. The problem he outlines sounds like a very thorny issue that will need collaboration with other projects (HIVE, AVRO, PARQUET etc), and even knowing how OSS works, this seems like a problem on a scale that will require active maintainer involvement.
   
   So coming back to what I wrote: I'm surprised (without criticism!) that this has a seemingly low priority, and I hope someone can find a way forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561433379



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       So, are you saying that there is no breaking change, @wangyum ?
   @srowen 's question is asking the reason why we need this change, isn't it?
   

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       Could you add some comment about the reason why you disable it? It looks like a workaround to avoid Parquet-side performance regression.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1580: Disables page-level CRC checksums by default.

Review comment:
       Wow. Then, it's a real bug. Thanks for confirmation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762619705


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134211/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063136


   **[Test build #134297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134297/testReport)** for PR 26804 at commit [`8a50d56`](https://github.com/apache/spark/commit/8a50d565dd9e2ce38f4b91bdbb2a5e82dc80b80b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763424219


   @cloud-fan @gatorsmile @srowen @dongjoon-hyun @HyukjinKwon @rdblue
   It does not have the performance regression, do you have more comments?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767415007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134487/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764150469






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762601527


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #26804:
URL: https://github.com/apache/spark/pull/26804


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630471244


   @iemejia 's Avro PR (#27609) didn't pass Apache Spark UTs. And, according to his report, this Parquet PR seems to be blocked by Avro dependency upgrade. If we have a clean PR for Avro to pass all UTs (including Hive 1.2/2.3 profile), we may restart to review it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707592


   **[Test build #134282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769489320


   BTW, Apache Parquet 1.12 is also one of the candidate we can choose in Apache Spark 3.2.0 timeframe.
   Apache Spark 1.12.0 RC1 vote started already.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561677113



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       Seems better to test both.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078


   @sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x.
   
   Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g. `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561447322



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       Looking at the original PR, I think the change should be backward-compatible (`map` annotation can still be handled  on the read path). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762573083


   **[Test build #134211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
   2. Writing out checksums has minimal performance impact.
   3. Do we really need this feature? I haven't seen Spark SQL users request this feature. This change just disable it by default, users can still enable this feature.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
   2. Writing out checksums has minimal performance impact.
   3. Do we really need this feature? I haven't seen Spark SQL users request this feature before. This change just disable it by default, users can still enable this feature.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764403535


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134301/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762151755


   **[Test build #134187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762619705


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134211/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306


   -1 (non-binding) Please do not merge this into master, it breaks downstream applications due to Avro 1.8.x vs 1.9.x transitive dependencies.
   
   I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063666


   >> Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
   >
   > #30517 is used for testing compatibility.
   
   Thank you, @wangyum!  As #31232 has been merged for Spark 3.2.0, I assume the target for this pull request is also version 3.2.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707557


   > Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
   
   #30517 is used for testing compatibility.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762130255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38772/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-723233104


   @iemejia yes it is the issue like @heuermh mentioned above that we need to be careful with, and which makes upgrading Hive necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764106286


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38882/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769662814


   @sunchao https://github.com/apache/spark/pull/31393


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561677113



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
         nullable = true))),
     """message root {
       |  optional group f1 (MAP) {
-      |    repeated group map (MAP_KEY_VALUE) {
+      |    repeated group key_value (MAP_KEY_VALUE) {

Review comment:
       Seems better to test both.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799875


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
   2. Writing out checksums has minimal performance impact.
   3. Do we really need this feature? I haven't seen Spark SQL users request this feature before. This change just disable it by default, users can still enable this feature.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564511619



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
       conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
     }
 
+    // PARQUET-1746: Disables page-level CRC checksums by default.
+    conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)

Review comment:
       I see it's been addressed in https://github.com/apache/spark/pull/26804/commits/72c52b64958340835e5a54b24aa68f201f4c15be, thanks for the quick fix @wangyum! 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762416314


   **[Test build #134204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630740908


   I'm a bit surprised that upgrading parquet has such a low priority, especially with many important features like column indexes & cleaning up the timestamp situation/compatibility (even though I get the avro-situation is complicated). Hope someone can find a way forward.
   
   > BTW, FYI, there is no Apache Hive release supporting Avro 1.9.x.
   
   There are open patches here: https://issues.apache.org/jira/browse/HIVE-21737 (also by @iemejia, open for a year already).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769490870


   Thank you @dongjoon-hyun  I will evaluate Parquet 1.12 soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762111277


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38772/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765888093


   **[Test build #134400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org