You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/01/18 07:14:55 UTC
[GitHub] [spark] wangyum opened a new pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
wangyum opened a new pull request #26804:
URL: https://github.com/apache/spark/pull/26804
### What changes were proposed in this pull request?
This PR upgrade Parquet to 1.11.1.
Parquet 1.11.1 new features:
- [PARQUET-1201](https://issues.apache.org/jira/browse/PARQUET-1201) - Column indexes
- [PARQUET-1253](https://issues.apache.org/jira/browse/PARQUET-1253) - Support for new logical type representation
- [PARQUET-1388](https://issues.apache.org/jira/browse/PARQUET-1388) - Nanosecond precision time and timestamp - parquet-mr
More details:
https://github.com/apache/parquet-mr/blob/master/CHANGES.md
### Why are the changes needed?
Support column indexes to improve query performance.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Exist test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764201684
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769671092
@wangyum :clap: great work !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630438863
Jenkins, retest it again.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559844780
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
1. `key_value` introduced by this fix: https://issues.apache.org/jira/browse/PARQUET-1879
2. I also use Parquet 1.11.1 to read this file: https://issues.apache.org/jira/browse/SPARK-32639
![image](https://user-images.githubusercontent.com/5399861/104973076-ab11d480-5a2e-11eb-9f1c-968ec63bce58.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764201684
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764334636
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38887/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762420859
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134204/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
2. Writing out checksums has minimal performance impact.
3. Do we really need this feature? I haven't seen Spark SQL users request this feature. This change just disable it by default, users can still enable this feature.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767415007
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134487/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630917979
@dongjoon-hyun You are absolutely right about no Hive with Avro 1.9 and that's the REAL problem. I don't think creating a PR that passes all UT (including Hive 1.2/2.3 profile) for Spark with Avro 1.9 is possible because Hive is leaking older versions of Avro that are not API compatible.
I don't know how to deal with this. I tried to patch Hive [HIVE-21737](https://issues.apache.org/jira/browse/HIVE-21737) for this but was blocked on testing issues there, but the issue is also that even if merged we need them to backport the fix back to version 2.x (Hive in master is already in version 4.x). Notice that the Avro upgrade addresses also various security issues in its deps that are still leaking and present on Spark (yes jackson among others).
I really want this to happen to get Avro 1.9.x downstream but it feels we are somehow locked because of Hive. If you or anyone can suggest how to do this, I will be more than glad to help with what I can. Also if someone knows someone at the Hive project who can care about this, maybe that would be another big help.
CC: @kgyrtkirk for eventual comments/suggestions because he tried to help me in the Hive side.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762573083
**[Test build #134211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561447322
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
Looking at the original PR, I think the change should be backward-compatible (`map` annotation can still be handled on the read path).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
It will change the data order, please seem https://github.com/apache/spark/pull/26804#discussion_r561044576.
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
It will change the data order, please see https://github.com/apache/spark/pull/26804#discussion_r561044576.
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
Thank you @gszadovszky The size is different if enable the CRC write:
```
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-0ae44ddf-40bb-4ba5-84af-ec8cec037847-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-166cdfbf-b19d-4d55-b4ea-fbad6bcac9df-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 462 Jan 21 22:20 part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-3c52b355-290a-4dd4-aad3-4bb2960ba3b8-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-4486173e-d650-4548-8da4-b95ae0305d8c-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-4c3786f4-2702-4f58-9604-c3deed68bc86-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-71bf8d51-95b0-43a8-969b-c28630f90066-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-78776231-370d-45c6-8520-67b94c33c697-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-a2a811aa-a495-4439-9daf-8c4b2cb258d5-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-bf745883-0b3a-4383-8669-7464833bfea8-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-d3a65d34-cd1a-434d-a86c-8ee0203b3bac-c000.snappy.parquet
yumwang@LM-SHC-16508156 1611238822602 % parquet-tools cat part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
a = 2
```
and we will order the file by size:
https://github.com/apache/spark/blob/8ed23ed499ec7745a8e9bdc4c4fb3200fdb6c3c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L609
Not sure if it caused by int overflow:
https://github.com/apache/parquet-mr/pull/647#discussion_r561914480
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764206843
**[Test build #134301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765888093
**[Test build #134400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-768793621
> Could you create a 3.2.0 blocker JIRA?
OK, https://issues.apache.org/jira/browse/SPARK-34276.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078
@sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x. Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on `parquet-avro` 1.11.x or newer? I don't know if that is possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078
@sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x.
Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564097098
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
@wangyum Any chance you could elaborate on this a bit more? Are we convinced that the issue you pointed out in https://github.com/apache/spark/pull/26804#discussion_r561044576 is actually a regression caused by parquet and not a problem with the test itself (e.g. caused by any non-trivial assumptions made w.r.t. the output files)? Considering the benefit of having checksums enabled by default (e.g. much improved visibility into hard to debug data corruption issues), I'd propose further investigation before disabling the feature entirely and having Spark diverge from the `parquet-mr` defaults.
Regarding the defaults, support for checksums was added back in [PARQUET-1580](https://github.com/apache/parquet-mr/pull/647). These changes were included and released with `parquet-mr` 1.11.0 (see [CHANGES](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/CHANGES.md#version-1110)), and writing out checksums has been enabled by default since the release, see `ParquetProperties.java` in:
* [master](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L61)
* [apache-parquet-1.11.0](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
* [apache-parquet-1.11.1](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
I also noticed that [PARQUET-1746](https://issues.apache.org/jira/browse/PARQUET-1746) was raised and [a PR](https://github.com/apache/parquet-mr/pull/857) was opened for it to set the default to `false`, but that the issue has already been marked as resolved and the PR closed without merging the changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564097098
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
@wangyum Any chance you could elaborate on this a bit more? Are we convinced that the issue you pointed out in https://github.com/apache/spark/pull/26804#discussion_r561044576 is actually a regression caused by parquet and not a problem with the test itself (e.g. caused by any non-trivial assumptions made w.r.t. the output files)? Considering the benefit of having checksums enabled by default (e.g. much improved visibility into hard to debug data corruption issues), I'd propose further investigation before disabling the feature entirely and having Spark diverge from the `parquet-mr` defaults.
Regarding the defaults, support for checksums was added back in [PARQUET-1580](https://github.com/apache/parquet-mr/pull/647). These changes were included and released with `parquet-mr` 1.11.0 (see [CHANGES](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/CHANGES.md#version-1110)), and writing out checksums has been enabled by default since the release, see `ParquetProperties.java` in:
* [master](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L61)
* [apache-parquet-1.11.0](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
* [apache-parquet-1.11.1](https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54)
I also noticed that [PARQUET-1746](https://issues.apache.org/jira/browse/PARQUET-1746) was raised and [a PR](https://github.com/apache/parquet-mr/pull/857) was opened for it to set the default to `false`, but that the issue has already been marked as resolved and the PR closed without merging the changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762338175
**[Test build #134204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764394945
**[Test build #134301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762601527
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762420859
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134204/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762361789
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306
-1 (non-binding) Please do not merge this into master, it breaks downstream applications due to Avro 1.8.x vs 1.9.x transitive dependencies.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769488412
Thank you, @wangyum and @gatorsmile !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763402025
Benchmark code and benchmark result:
```scala
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.sql.execution.benchmark
import java.io.File
import scala.util.Random
import org.apache.spark.SparkConf
import org.apache.spark.benchmark.Benchmark
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions.{monotonically_increasing_id, timestamp_seconds}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType}
object ParquetFilterPushdownBenchmark extends SqlBasedBenchmark {
override def getSparkSession: SparkSession = {
val conf = new SparkConf()
.setAppName(this.getClass.getSimpleName)
// Since `spark.master` always exists, overrides this value
.set("spark.master", "local[1]")
.setIfMissing("spark.driver.memory", "3g")
.setIfMissing("spark.executor.memory", "3g")
.setIfMissing("orc.compression", "snappy")
.setIfMissing("spark.sql.parquet.compression.codec", "snappy")
SparkSession.builder().config(conf).getOrCreate()
}
private val numRows = 1024 * 1024 * 15
private val width = 5
private val mid = numRows / 2
def withTempTable(tableNames: String*)(f: => Unit): Unit = {
try f finally tableNames.foreach(spark.catalog.dropTempView)
}
private def prepareTable(
dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
import spark.implicits._
val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
val valueCol = if (useStringForValue) {
monotonically_increasing_id().cast("string")
} else {
monotonically_increasing_id()
}
val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
.withColumn("value", valueCol)
.sort("value")
saveAsTable(df, dir)
}
private def prepareStringDictTable(
dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
val selectExpr = (0 to width).map {
case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
case i => s"CAST(rand() AS STRING) c$i"
}
val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
saveAsTable(df, dir)
}
private def saveAsTable(df: DataFrame, dir: File): Unit = {
val parquetPath = dir.getCanonicalPath + "/parquet"
df.write.mode("overwrite").parquet(parquetPath)
spark.read.parquet(parquetPath).createOrReplaceTempView("parquetTable")
}
def filterPushDownBenchmark(
values: Int,
title: String,
whereExpr: String,
selectExpr: String = "*"): Unit = {
val benchmark = new Benchmark(title, values, minNumIters = 5, output = output)
Seq(false, true).foreach { pushDownEnabled =>
val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
benchmark.addCase(name) { _ =>
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").noop()
}
}
}
benchmark.run()
}
private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
Seq(
s"value = $mid",
s"value <=> $mid",
s"$mid <= value AND value <= $mid",
s"${mid - 1} < value AND value < ${mid + 1}"
).foreach { whereExpr =>
val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
Seq(10, 50, 90).foreach { percent =>
filterPushDownBenchmark(
numRows,
s"Select $percent% int rows (value < ${numRows * percent / 100})",
s"value < ${numRows * percent / 100}",
selectExpr
)
}
Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
filterPushDownBenchmark(
numRows,
s"Select all int rows ($whereExpr)",
whereExpr,
selectExpr)
}
}
private def runStringBenchmark(
numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
.foreach { whereExpr =>
val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
Seq(
s"value = '$searchValue'",
s"value <=> '$searchValue'",
s"'$searchValue' <= value AND value <= '$searchValue'"
).foreach { whereExpr =>
val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
Seq("value IS NOT NULL").foreach { whereExpr =>
filterPushDownBenchmark(
numRows,
s"Select all $colType rows ($whereExpr)",
whereExpr,
selectExpr)
}
}
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
runBenchmark("Pushdown for many distinct value case") {
withTempPath { dir =>
withTempTable("parquetTable") {
Seq(true, false).foreach { useStringForValue =>
prepareTable(dir, numRows, width, useStringForValue)
if (useStringForValue) {
runStringBenchmark(numRows, width, mid, "string")
} else {
runIntBenchmark(numRows, width, mid)
}
}
}
}
}
runBenchmark("Pushdown for few distinct value case (use dictionary encoding)") {
withTempPath { dir =>
val numDistinctValues = 200
withTempTable("parquetTable") {
prepareStringDictTable(dir, numRows, numDistinctValues, width)
runStringBenchmark(numRows, width, numDistinctValues / 2, "distinct string")
}
}
}
runBenchmark("Pushdown benchmark for StringStartsWith") {
withTempPath { dir =>
withTempTable("parquetTable") {
prepareTable(dir, numRows, width, true)
Seq(
"value like '10%'",
"value like '1000%'",
s"value like '${mid.toString.substring(0, mid.toString.length - 1)}%'"
).foreach { whereExpr =>
val title = s"StringStartsWith filter: ($whereExpr)"
filterPushDownBenchmark(numRows, title, whereExpr)
}
}
}
}
runBenchmark(s"Pushdown benchmark for ${DecimalType.simpleString}") {
withTempPath { dir =>
Seq(
s"decimal(${Decimal.MAX_INT_DIGITS}, 2)",
s"decimal(${Decimal.MAX_LONG_DIGITS}, 2)",
s"decimal(${DecimalType.MAX_PRECISION}, 2)"
).foreach { dt =>
val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
val valueCol = if (dt.equalsIgnoreCase(s"decimal(${Decimal.MAX_INT_DIGITS}, 2)")) {
monotonically_increasing_id() % 9999999
} else {
monotonically_increasing_id()
}
val df = spark.range(numRows)
.selectExpr(columns: _*).withColumn("value", valueCol.cast(dt))
withTempTable("parquetTable") {
saveAsTable(df, dir)
Seq(s"value = $mid").foreach { whereExpr =>
val title = s"Select 1 $dt row ($whereExpr)".replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
Seq(10, 50, 90).foreach { percent =>
filterPushDownBenchmark(
numRows,
s"Select $percent% $dt rows (value < ${numRows * percent / 100})",
s"value < ${numRows * percent / 100}",
selectExpr
)
}
}
}
}
}
runBenchmark("Pushdown benchmark for InSet -> InFilters") {
withTempPath { dir =>
withTempTable("parquetTable") {
prepareTable(dir, numRows, width, false)
Seq(5, 10, 50, 100).foreach { count =>
Seq(10, 50, 90).foreach { distribution =>
val filter =
Range(0, count).map(r => scala.util.Random.nextInt(numRows * distribution / 100))
val whereExpr = s"value in(${filter.mkString(",")})"
val title = s"InSet -> InFilters (values count: $count, distribution: $distribution)"
filterPushDownBenchmark(numRows, title, whereExpr)
}
}
}
}
}
runBenchmark(s"Pushdown benchmark for ${ByteType.simpleString}") {
withTempPath { dir =>
val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
val df = spark.range(numRows).selectExpr(columns: _*)
.withColumn("value", (monotonically_increasing_id() % Byte.MaxValue).cast(ByteType))
.orderBy("value")
withTempTable("parquetTable") {
saveAsTable(df, dir)
Seq(s"value = CAST(${Byte.MaxValue / 2} AS ${ByteType.simpleString})")
.foreach { whereExpr =>
val title = s"Select 1 ${ByteType.simpleString} row ($whereExpr)"
.replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
Seq(10, 50, 90).foreach { percent =>
filterPushDownBenchmark(
numRows,
s"Select $percent% ${ByteType.simpleString} rows " +
s"(value < CAST(${Byte.MaxValue * percent / 100} AS ${ByteType.simpleString}))",
s"value < CAST(${Byte.MaxValue * percent / 100} AS ${ByteType.simpleString})",
selectExpr
)
}
}
}
}
runBenchmark(s"Pushdown benchmark for Timestamp") {
withTempPath { dir =>
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED.key -> true.toString) {
ParquetOutputTimestampType.values.toSeq.map(_.toString).foreach { fileType =>
withSQLConf(SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key -> fileType) {
val columns = (1 to width).map(i => s"CAST(id AS string) c$i")
val df = spark.range(numRows).selectExpr(columns: _*)
.withColumn("value", timestamp_seconds(monotonically_increasing_id()))
withTempTable("parquetTable") {
saveAsTable(df, dir)
Seq(s"value = timestamp_seconds($mid)").foreach { whereExpr =>
val title = s"Select 1 timestamp stored as $fileType row ($whereExpr)"
.replace("value AND value", "value")
filterPushDownBenchmark(numRows, title, whereExpr)
}
val selectExpr = (1 to width)
.map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
Seq(10, 50, 90).foreach { percent =>
filterPushDownBenchmark(
numRows,
s"Select $percent% timestamp stored as $fileType rows " +
s"(value < timestamp_seconds(${numRows * percent / 100}))",
s"value < timestamp_seconds(${numRows * percent / 100})",
selectExpr
)
}
}
}
}
}
}
}
runBenchmark(s"Pushdown benchmark with many filters") {
val numRows = 1
val width = 500
withTempPath { dir =>
val columns = (1 to width).map(i => s"id c$i")
val df = spark.range(1).selectExpr(columns: _*)
withTempTable("parquetTable") {
saveAsTable(df, dir)
Seq(1, 250, 500).foreach { numFilter =>
val whereExpr = (1 to numFilter).map(i => s"c$i = 0").mkString(" and ")
// Note: InferFiltersFromConstraints will add more filters to this given filters
filterPushDownBenchmark(numRows, s"Select 1 row with $numFilter filters", whereExpr)
}
}
}
}
}
}
```
Parquet 1.10.1:
```
[info] 18:42:20.840 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: Select 0 string row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43822 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 20 iterations, 2066 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8669 8765 94 1.8 551.1 1.0X
[info] Parquet Vectorized (Pushdown) 87 103 10 180.0 5.6 99.2X
[info] Running benchmark: Select 0 string row ('7864320' < value < '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44140 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4492 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 string row ('7864320' < value < '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8729 8828 88 1.8 555.0 1.0X
[info] Parquet Vectorized (Pushdown) 888 898 12 17.7 56.5 9.8X
[info] Running benchmark: Select 1 string row (value = '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43788 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4415 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row (value = '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8679 8758 69 1.8 551.8 1.0X
[info] Parquet Vectorized (Pushdown) 868 883 13 18.1 55.2 10.0X
[info] Running benchmark: Select 1 string row (value <=> '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43544 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4352 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row (value <=> '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8648 8709 54 1.8 549.8 1.0X
[info] Parquet Vectorized (Pushdown) 861 870 8 18.3 54.7 10.0X
[info] Running benchmark: Select 1 string row ('7864320' <= value <= '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43898 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4415 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row ('7864320' <= value <= '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8711 8780 94 1.8 553.8 1.0X
[info] Parquet Vectorized (Pushdown) 870 883 8 18.1 55.3 10.0X
[info] Running benchmark: Select all string rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 85779 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 85130 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all string rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 17006 17156 139 0.9 1081.2 1.0X
[info] Parquet Vectorized (Pushdown) 16922 17026 112 0.9 1075.9 1.0X
[info] Running benchmark: Select 0 int row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41677 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 26 iterations, 2042 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 int row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8277 8336 58 1.9 526.2 1.0X
[info] Parquet Vectorized (Pushdown) 74 79 5 213.9 4.7 112.5X
[info] Running benchmark: Select 0 int row (7864320 < value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41824 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4201 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 int row (7864320 < value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8274 8365 82 1.9 526.1 1.0X
[info] Parquet Vectorized (Pushdown) 813 840 18 19.3 51.7 10.2X
[info] Running benchmark: Select 1 int row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41763 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4392 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8218 8353 90 1.9 522.5 1.0X
[info] Parquet Vectorized (Pushdown) 857 879 18 18.4 54.5 9.6X
[info] Running benchmark: Select 1 int row (value <=> 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41937 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4133 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (value <=> 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8275 8387 112 1.9 526.1 1.0X
[info] Parquet Vectorized (Pushdown) 816 827 11 19.3 51.9 10.1X
[info] Running benchmark: Select 1 int row (7864320 <= value <= 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41648 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4247 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (7864320 <= value <= 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8299 8330 26 1.9 527.6 1.0X
[info] Parquet Vectorized (Pushdown) 818 849 22 19.2 52.0 10.1X
[info] Running benchmark: Select 1 int row (7864319 < value < 7864321)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41604 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4159 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (7864319 < value < 7864321): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8228 8321 74 1.9 523.1 1.0X
[info] Parquet Vectorized (Pushdown) 814 832 11 19.3 51.7 10.1X
[info] Running benchmark: Select 10% int rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 45888 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 12000 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% int rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9131 9178 41 1.7 580.6 1.0X
[info] Parquet Vectorized (Pushdown) 2377 2400 17 6.6 151.1 3.8X
[info] Running benchmark: Select 50% int rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 61875 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 42681 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% int rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 12166 12375 226 1.3 773.5 1.0X
[info] Parquet Vectorized (Pushdown) 8408 8536 106 1.9 534.6 1.4X
[info] Running benchmark: Select 90% int rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 76034 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 72997 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% int rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15098 15207 74 1.0 959.9 1.0X
[info] Parquet Vectorized (Pushdown) 14390 14599 127 1.1 914.9 1.0X
[info] Running benchmark: Select all int rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 80290 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 81014 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15749 16058 199 1.0 1001.3 1.0X
[info] Parquet Vectorized (Pushdown) 16147 16203 69 1.0 1026.6 1.0X
[info] Running benchmark: Select all int rows (value > -1)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 81133 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 81411 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value > -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 16103 16227 111 1.0 1023.8 1.0X
[info] Parquet Vectorized (Pushdown) 16125 16282 142 1.0 1025.2 1.0X
[info] Running benchmark: Select all int rows (value != -1)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 81013 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 80343 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value != -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 16073 16203 117 1.0 1021.9 1.0X
[info] Parquet Vectorized (Pushdown) 15942 16069 84 1.0 1013.6 1.0X
[info] Running benchmark: Select 0 distinct string row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40258 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 31 iterations, 2054 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 distinct string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7953 8052 84 2.0 505.7 1.0X
[info] Parquet Vectorized (Pushdown) 62 66 6 253.5 3.9 128.2X
[info] Running benchmark: Select 0 distinct string row ('100' < value < '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40734 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4731 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 distinct string row ('100' < value < '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8026 8147 73 2.0 510.3 1.0X
[info] Parquet Vectorized (Pushdown) 939 946 6 16.8 59.7 8.5X
[info] Running benchmark: Select 1 distinct string row (value = '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40674 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4874 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row (value = '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8034 8135 87 2.0 510.8 1.0X
[info] Parquet Vectorized (Pushdown) 957 975 27 16.4 60.9 8.4X
[info] Running benchmark: Select 1 distinct string row (value <=> '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40781 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4698 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row (value <=> '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8112 8156 39 1.9 515.7 1.0X
[info] Parquet Vectorized (Pushdown) 926 940 9 17.0 58.9 8.8X
[info] Running benchmark: Select 1 distinct string row ('100' <= value <= '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41005 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5174 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row ('100' <= value <= '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8151 8201 42 1.9 518.2 1.0X
[info] Parquet Vectorized (Pushdown) 1014 1035 32 15.5 64.5 8.0X
[info] Running benchmark: Select all distinct string rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 89835 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 90269 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all distinct string rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 17886 17967 56 0.9 1137.2 1.0X
[info] Parquet Vectorized (Pushdown) 17979 18054 100 0.9 1143.0 1.0X
[info] Running benchmark: StringStartsWith filter: (value like '10%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 46786 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5455 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '10%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9191 9357 168 1.7 584.4 1.0X
[info] Parquet Vectorized (Pushdown) 1075 1091 11 14.6 68.4 8.5X
[info] Running benchmark: StringStartsWith filter: (value like '1000%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 45468 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4483 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '1000%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9017 9094 116 1.7 573.3 1.0X
[info] Parquet Vectorized (Pushdown) 888 897 7 17.7 56.5 10.1X
[info] Running benchmark: StringStartsWith filter: (value like '786432%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 45429 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4428 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '786432%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9037 9086 55 1.7 574.6 1.0X
[info] Parquet Vectorized (Pushdown) 864 886 17 18.2 55.0 10.5X
[info] Running benchmark: Select 1 decimal(9, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 17614 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5788 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(9, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3488 3523 27 4.5 221.8 1.0X
[info] Parquet Vectorized (Pushdown) 1148 1158 11 13.7 73.0 3.0X
[info] Running benchmark: Select 10% decimal(9, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 25815 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 25522 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(9, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5117 5163 63 3.1 325.3 1.0X
[info] Parquet Vectorized (Pushdown) 5044 5104 55 3.1 320.7 1.0X
[info] Running benchmark: Select 50% decimal(9, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 52939 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 52691 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(9, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10443 10588 116 1.5 663.9 1.0X
[info] Parquet Vectorized (Pushdown) 10388 10538 173 1.5 660.5 1.0X
[info] Running benchmark: Select 90% decimal(9, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 58989 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 59164 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(9, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11676 11798 96 1.3 742.3 1.0X
[info] Parquet Vectorized (Pushdown) 11718 11833 112 1.3 745.0 1.0X
[info] Running benchmark: Select 1 decimal(18, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 18284 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5992 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(18, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3583 3657 49 4.4 227.8 1.0X
[info] Parquet Vectorized (Pushdown) 1187 1198 7 13.2 75.5 3.0X
[info] Running benchmark: Select 10% decimal(18, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 23432 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 10519 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(18, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4603 4686 77 3.4 292.7 1.0X
[info] Parquet Vectorized (Pushdown) 2058 2104 92 7.6 130.8 2.2X
[info] Running benchmark: Select 50% decimal(18, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 39380 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 32688 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(18, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7805 7876 86 2.0 496.2 1.0X
[info] Parquet Vectorized (Pushdown) 6475 6538 68 2.4 411.6 1.2X
[info] Running benchmark: Select 90% decimal(18, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 55690 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 54683 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(18, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11000 11138 112 1.4 699.3 1.0X
[info] Parquet Vectorized (Pushdown) 10764 10937 125 1.5 684.4 1.0X
[info] Running benchmark: Select 1 decimal(38, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 29479 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9146 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(38, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5655 5896 242 2.8 359.6 1.0X
[info] Parquet Vectorized (Pushdown) 1808 1829 18 8.7 115.0 3.1X
[info] Running benchmark: Select 10% decimal(38, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 34809 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 14529 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(38, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 6878 6962 62 2.3 437.3 1.0X
[info] Parquet Vectorized (Pushdown) 2861 2906 69 5.5 181.9 2.4X
[info] Running benchmark: Select 50% decimal(38, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 55777 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 44400 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(38, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10967 11155 151 1.4 697.3 1.0X
[info] Parquet Vectorized (Pushdown) 8769 8880 111 1.8 557.5 1.3X
[info] Running benchmark: Select 90% decimal(38, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 75507 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 73697 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(38, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 14916 15101 115 1.1 948.3 1.0X
[info] Parquet Vectorized (Pushdown) 14623 14740 103 1.1 929.7 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42201 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4194 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8280 8440 93 1.9 526.4 1.0X
[info] Parquet Vectorized (Pushdown) 813 839 19 19.4 51.7 10.2X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41743 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 15602 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8263 8349 87 1.9 525.3 1.0X
[info] Parquet Vectorized (Pushdown) 3108 3120 10 5.1 197.6 2.7X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42229 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 15575 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8356 8446 89 1.9 531.2 1.0X
[info] Parquet Vectorized (Pushdown) 3062 3115 64 5.1 194.7 2.7X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42041 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 8012 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8317 8408 77 1.9 528.7 1.0X
[info] Parquet Vectorized (Pushdown) 1577 1603 21 10.0 100.2 5.3X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41870 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 15558 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8321 8374 43 1.9 529.0 1.0X
[info] Parquet Vectorized (Pushdown) 3069 3112 40 5.1 195.1 2.7X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42102 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 19401 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8382 8420 46 1.9 532.9 1.0X
[info] Parquet Vectorized (Pushdown) 3865 3880 17 4.1 245.7 2.2X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43390 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 44089 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8594 8678 85 1.8 546.4 1.0X
[info] Parquet Vectorized (Pushdown) 8710 8818 141 1.8 553.7 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43434 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43449 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8643 8687 32 1.8 549.5 1.0X
[info] Parquet Vectorized (Pushdown) 8537 8690 142 1.8 542.8 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43472 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43329 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8633 8695 65 1.8 548.9 1.0X
[info] Parquet Vectorized (Pushdown) 8635 8666 29 1.8 549.0 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42939 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43868 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8486 8588 81 1.9 539.6 1.0X
[info] Parquet Vectorized (Pushdown) 8663 8774 175 1.8 550.8 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43116 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43589 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8566 8623 46 1.8 544.6 1.0X
[info] Parquet Vectorized (Pushdown) 8646 8718 84 1.8 549.7 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43544 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43485 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8639 8709 56 1.8 549.3 1.0X
[info] Parquet Vectorized (Pushdown) 8638 8697 53 1.8 549.2 1.0X
[info] Running benchmark: Select 1 tinyint row (value = CAST(63 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 19550 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 6223 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 tinyint row (value = CAST(63 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3749 3910 147 4.2 238.3 1.0X
[info] Parquet Vectorized (Pushdown) 1184 1245 44 13.3 75.3 3.2X
[info] Running benchmark: Select 10% tinyint rows (value < CAST(12 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 23026 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9723 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% tinyint rows (value < CAST(12 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4552 4605 46 3.5 289.4 1.0X
[info] Parquet Vectorized (Pushdown) 1906 1945 52 8.3 121.2 2.4X
[info] Running benchmark: Select 50% tinyint rows (value < CAST(63 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 38202 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 30731 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% tinyint rows (value < CAST(63 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7511 7641 103 2.1 477.6 1.0X
[info] Parquet Vectorized (Pushdown) 6108 6146 49 2.6 388.3 1.2X
[info] Running benchmark: Select 90% tinyint rows (value < CAST(114 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 54038 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 53985 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% tinyint rows (value < CAST(114 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10695 10808 92 1.5 680.0 1.0X
[info] Parquet Vectorized (Pushdown) 10648 10797 137 1.5 677.0 1.0X
[info] Running benchmark: Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 21389 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 20900 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4173 4278 82 3.8 265.3 1.0X
[info] Parquet Vectorized (Pushdown) 4130 4180 37 3.8 262.6 1.0X
[info] Running benchmark: Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 25237 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 25245 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4985 5047 67 3.2 316.9 1.0X
[info] Parquet Vectorized (Pushdown) 4968 5049 73 3.2 315.9 1.0X
[info] Running benchmark: Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40629 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 40929 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8005 8126 109 2.0 509.0 1.0X
[info] Parquet Vectorized (Pushdown) 8087 8186 67 1.9 514.2 1.0X
[info] Running benchmark: Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 55942 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 56599 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10905 11189 178 1.4 693.3 1.0X
[info] Parquet Vectorized (Pushdown) 11054 11320 203 1.4 702.8 1.0X
[info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 17659 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5428 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3475 3532 43 4.5 220.9 1.0X
[info] Parquet Vectorized (Pushdown) 1072 1086 9 14.7 68.2 3.2X
[info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 21779 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9752 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4344 4356 9 3.6 276.2 1.0X
[info] Parquet Vectorized (Pushdown) 1874 1950 87 8.4 119.2 2.3X
[info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 37830 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 30583 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7478 7566 120 2.1 475.5 1.0X
[info] Parquet Vectorized (Pushdown) 6034 6117 97 2.6 383.6 1.2X
[info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 52857 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 53101 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10443 10571 129 1.5 664.0 1.0X
[info] Parquet Vectorized (Pushdown) 10491 10620 215 1.5 667.0 1.0X
[info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 18656 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5916 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3718 3731 24 4.2 236.4 1.0X
[info] Parquet Vectorized (Pushdown) 1157 1183 17 13.6 73.6 3.2X
[info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 22909 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 10248 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4568 4582 16 3.4 290.4 1.0X
[info] Parquet Vectorized (Pushdown) 2005 2050 51 7.8 127.5 2.3X
[info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 38751 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 31321 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7651 7750 89 2.1 486.4 1.0X
[info] Parquet Vectorized (Pushdown) 6198 6264 94 2.5 394.1 1.2X
[info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 53723 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 53353 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10563 10745 117 1.5 671.6 1.0X
[info] Parquet Vectorized (Pushdown) 10542 10671 147 1.5 670.2 1.0X
[info] 20:25:52.074 WARN org.apache.spark.sql.catalyst.util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Select 1 row with 1 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 12 iterations, 2085 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 13 iterations, 2161 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 1 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 163 174 13 0.0 162534801.0 1.0X
[info] Parquet Vectorized (Pushdown) 161 166 5 0.0 161189323.0 1.0X
[info] Running benchmark: Select 1 row with 250 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 4092 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4668 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 250 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 806 819 19 0.0 806155381.0 1.0X
[info] Parquet Vectorized (Pushdown) 910 934 17 0.0 909761809.0 0.9X
[info] Running benchmark: Select 1 row with 500 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 15143 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 17252 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 500 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 2994 3029 23 0.0 2993624958.0 1.0X
[info] Parquet Vectorized (Pushdown) 3438 3451 12 0.0 3437503212.0 0.9X
[success] Total time: 6320 s (01:45:20), completed Jan 19, 2021 8:26:57 PM
```
Parquet 1.11.1:
```
[info] 22:44:02.552 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: Select 0 string row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44098 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 20 iterations, 2028 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8753 8820 84 1.8 556.5 1.0X
[info] Parquet Vectorized (Pushdown) 89 101 10 177.7 5.6 98.9X
[info] Running benchmark: Select 0 string row ('7864320' < value < '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44149 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4627 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 string row ('7864320' < value < '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8774 8830 47 1.8 557.8 1.0X
[info] Parquet Vectorized (Pushdown) 906 926 15 17.4 57.6 9.7X
[info] Running benchmark: Select 1 string row (value = '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44520 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4633 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row (value = '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8780 8904 82 1.8 558.2 1.0X
[info] Parquet Vectorized (Pushdown) 901 927 22 17.5 57.3 9.7X
[info] Running benchmark: Select 1 string row (value <=> '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44581 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4554 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row (value <=> '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8899 8916 10 1.8 565.8 1.0X
[info] Parquet Vectorized (Pushdown) 897 911 15 17.5 57.1 9.9X
[info] Running benchmark: Select 1 string row ('7864320' <= value <= '7864320')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44143 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4487 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 string row ('7864320' <= value <= '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8693 8829 96 1.8 552.7 1.0X
[info] Parquet Vectorized (Pushdown) 885 898 12 17.8 56.3 9.8X
[info] Running benchmark: Select all string rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 85771 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 85841 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all string rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 17097 17154 60 0.9 1087.0 1.0X
[info] Parquet Vectorized (Pushdown) 17017 17168 138 0.9 1081.9 1.0X
[info] Running benchmark: Select 0 int row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41273 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 27 iterations, 2061 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 int row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8239 8255 12 1.9 523.8 1.0X
[info] Parquet Vectorized (Pushdown) 70 76 5 224.5 4.5 117.6X
[info] Running benchmark: Select 0 int row (7864320 < value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41954 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4106 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 int row (7864320 < value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8222 8391 122 1.9 522.7 1.0X
[info] Parquet Vectorized (Pushdown) 808 821 11 19.5 51.4 10.2X
[info] Running benchmark: Select 1 int row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41815 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4120 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8317 8363 61 1.9 528.8 1.0X
[info] Parquet Vectorized (Pushdown) 807 824 15 19.5 51.3 10.3X
[info] Running benchmark: Select 1 int row (value <=> 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42163 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4088 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (value <=> 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8351 8433 67 1.9 530.9 1.0X
[info] Parquet Vectorized (Pushdown) 804 818 23 19.6 51.1 10.4X
[info] Running benchmark: Select 1 int row (7864320 <= value <= 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42349 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4223 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (7864320 <= value <= 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8389 8470 63 1.9 533.4 1.0X
[info] Parquet Vectorized (Pushdown) 835 845 10 18.8 53.1 10.0X
[info] Running benchmark: Select 1 int row (7864319 < value < 7864321)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41947 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4084 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 int row (7864319 < value < 7864321): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8347 8390 54 1.9 530.7 1.0X
[info] Parquet Vectorized (Pushdown) 795 817 19 19.8 50.5 10.5X
[info] Running benchmark: Select 10% int rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 46948 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 12149 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% int rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9248 9390 101 1.7 588.0 1.0X
[info] Parquet Vectorized (Pushdown) 2415 2430 15 6.5 153.5 3.8X
[info] Running benchmark: Select 50% int rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 60395 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 41469 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% int rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11943 12079 187 1.3 759.3 1.0X
[info] Parquet Vectorized (Pushdown) 8192 8294 63 1.9 520.8 1.5X
[info] Running benchmark: Select 90% int rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 75730 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 72593 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% int rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15026 15146 119 1.0 955.3 1.0X
[info] Parquet Vectorized (Pushdown) 14315 14519 212 1.1 910.1 1.0X
[info] Running benchmark: Select all int rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 79340 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 79510 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15715 15868 128 1.0 999.2 1.0X
[info] Parquet Vectorized (Pushdown) 15791 15902 85 1.0 1004.0 1.0X
[info] Running benchmark: Select all int rows (value > -1)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 79442 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 78576 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value > -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15760 15889 163 1.0 1002.0 1.0X
[info] Parquet Vectorized (Pushdown) 15679 15715 32 1.0 996.8 1.0X
[info] Running benchmark: Select all int rows (value != -1)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 79189 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 80052 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all int rows (value != -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 15669 15838 118 1.0 996.2 1.0X
[info] Parquet Vectorized (Pushdown) 15710 16010 248 1.0 998.8 1.0X
[info] Running benchmark: Select 0 distinct string row (value IS NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 39957 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 30 iterations, 2038 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 distinct string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7899 7991 63 2.0 502.2 1.0X
[info] Parquet Vectorized (Pushdown) 62 68 6 255.1 3.9 128.1X
[info] Running benchmark: Select 0 distinct string row ('100' < value < '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40549 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4740 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 0 distinct string row ('100' < value < '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8009 8110 82 2.0 509.2 1.0X
[info] Parquet Vectorized (Pushdown) 939 948 7 16.8 59.7 8.5X
[info] Running benchmark: Select 1 distinct string row (value = '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40421 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4797 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row (value = '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8038 8084 40 2.0 511.1 1.0X
[info] Parquet Vectorized (Pushdown) 949 959 9 16.6 60.3 8.5X
[info] Running benchmark: Select 1 distinct string row (value <=> '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41089 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4819 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row (value <=> '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8025 8218 150 2.0 510.2 1.0X
[info] Parquet Vectorized (Pushdown) 944 964 16 16.7 60.0 8.5X
[info] Running benchmark: Select 1 distinct string row ('100' <= value <= '100')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40887 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4829 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 distinct string row ('100' <= value <= '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8124 8177 73 1.9 516.5 1.0X
[info] Parquet Vectorized (Pushdown) 952 966 12 16.5 60.5 8.5X
[info] Running benchmark: Select all distinct string rows (value IS NOT NULL)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 87519 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 87496 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select all distinct string rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 17196 17504 206 0.9 1093.3 1.0X
[info] Parquet Vectorized (Pushdown) 17342 17499 148 0.9 1102.6 1.0X
[info] Running benchmark: StringStartsWith filter: (value like '10%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 45539 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5401 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '10%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 9037 9108 62 1.7 574.5 1.0X
[info] Parquet Vectorized (Pushdown) 1063 1080 14 14.8 67.6 8.5X
[info] Running benchmark: StringStartsWith filter: (value like '1000%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44501 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4443 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '1000%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8807 8900 78 1.8 560.0 1.0X
[info] Parquet Vectorized (Pushdown) 865 889 20 18.2 55.0 10.2X
[info] Running benchmark: StringStartsWith filter: (value like '786432%')
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 44776 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4388 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] StringStartsWith filter: (value like '786432%'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8797 8955 109 1.8 559.3 1.0X
[info] Parquet Vectorized (Pushdown) 854 878 20 18.4 54.3 10.3X
[info] Running benchmark: Select 1 decimal(9, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 17622 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5921 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(9, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3475 3525 57 4.5 220.9 1.0X
[info] Parquet Vectorized (Pushdown) 1166 1184 19 13.5 74.1 3.0X
[info] Running benchmark: Select 10% decimal(9, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 26543 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 25522 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(9, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5075 5309 154 3.1 322.7 1.0X
[info] Parquet Vectorized (Pushdown) 4943 5105 121 3.2 314.2 1.0X
[info] Running benchmark: Select 50% decimal(9, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 51448 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 52535 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(9, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10168 10290 94 1.5 646.5 1.0X
[info] Parquet Vectorized (Pushdown) 10386 10507 96 1.5 660.3 1.0X
[info] Running benchmark: Select 90% decimal(9, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 59845 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 59254 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(9, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11815 11969 240 1.3 751.2 1.0X
[info] Parquet Vectorized (Pushdown) 11655 11851 209 1.3 741.0 1.0X
[info] Running benchmark: Select 1 decimal(18, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 18282 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 6164 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(18, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3597 3657 59 4.4 228.7 1.0X
[info] Parquet Vectorized (Pushdown) 1219 1233 14 12.9 77.5 3.0X
[info] Running benchmark: Select 10% decimal(18, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 22746 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 10375 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(18, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4484 4549 67 3.5 285.1 1.0X
[info] Parquet Vectorized (Pushdown) 2023 2075 57 7.8 128.6 2.2X
[info] Running benchmark: Select 50% decimal(18, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 39274 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 33687 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(18, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7792 7855 48 2.0 495.4 1.0X
[info] Parquet Vectorized (Pushdown) 6498 6738 150 2.4 413.2 1.2X
[info] Running benchmark: Select 90% decimal(18, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 56243 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 55540 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(18, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11127 11249 167 1.4 707.5 1.0X
[info] Parquet Vectorized (Pushdown) 10841 11108 225 1.5 689.3 1.0X
[info] Running benchmark: Select 1 decimal(38, 2) row (value = 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 29521 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9333 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 decimal(38, 2) row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5766 5904 94 2.7 366.6 1.0X
[info] Parquet Vectorized (Pushdown) 1836 1867 53 8.6 116.8 3.1X
[info] Running benchmark: Select 10% decimal(38, 2) rows (value < 1572864)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 34386 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 14350 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% decimal(38, 2) rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 6746 6877 122 2.3 428.9 1.0X
[info] Parquet Vectorized (Pushdown) 2807 2870 75 5.6 178.5 2.4X
[info] Running benchmark: Select 50% decimal(38, 2) rows (value < 7864320)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 54192 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43783 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% decimal(38, 2) rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10681 10839 142 1.5 679.1 1.0X
[info] Parquet Vectorized (Pushdown) 8550 8757 162 1.8 543.6 1.2X
[info] Running benchmark: Select 90% decimal(38, 2) rows (value < 14155776)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 74674 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 72033 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% decimal(38, 2) rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 14675 14935 295 1.1 933.0 1.0X
[info] Parquet Vectorized (Pushdown) 14171 14407 158 1.1 901.0 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41729 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4213 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8288 8346 44 1.9 526.9 1.0X
[info] Parquet Vectorized (Pushdown) 838 843 6 18.8 53.3 9.9X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41750 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 15555 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8273 8350 55 1.9 526.0 1.0X
[info] Parquet Vectorized (Pushdown) 3101 3111 14 5.1 197.1 2.7X
[info] Running benchmark: InSet -> InFilters (values count: 5, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41873 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 11725 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 5, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8303 8375 94 1.9 527.9 1.0X
[info] Parquet Vectorized (Pushdown) 2307 2345 24 6.8 146.7 3.6X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41760 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 8029 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8307 8352 47 1.9 528.1 1.0X
[info] Parquet Vectorized (Pushdown) 1588 1606 15 9.9 100.9 5.2X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41862 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 19294 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8258 8373 77 1.9 525.0 1.0X
[info] Parquet Vectorized (Pushdown) 3814 3859 32 4.1 242.5 2.2X
[info] Running benchmark: InSet -> InFilters (values count: 10, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 41883 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 27256 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 10, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8270 8377 74 1.9 525.8 1.0X
[info] Parquet Vectorized (Pushdown) 5332 5451 165 3.0 339.0 1.6X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43408 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43478 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8632 8682 35 1.8 548.8 1.0X
[info] Parquet Vectorized (Pushdown) 8647 8696 48 1.8 549.8 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43469 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43325 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8653 8694 28 1.8 550.2 1.0X
[info] Parquet Vectorized (Pushdown) 8627 8665 39 1.8 548.5 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 50, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43451 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 44043 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 50, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8622 8690 81 1.8 548.2 1.0X
[info] Parquet Vectorized (Pushdown) 8597 8809 208 1.8 546.6 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 10)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43363 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43095 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 10): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8579 8673 98 1.8 545.5 1.0X
[info] Parquet Vectorized (Pushdown) 8566 8619 39 1.8 544.6 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 50)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 43184 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43077 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 50): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8582 8637 61 1.8 545.6 1.0X
[info] Parquet Vectorized (Pushdown) 8530 8615 67 1.8 542.3 1.0X
[info] Running benchmark: InSet -> InFilters (values count: 100, distribution: 90)
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 42947 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 43033 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] InSet -> InFilters (values count: 100, distribution: 90): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 8487 8590 98 1.9 539.6 1.0X
[info] Parquet Vectorized (Pushdown) 8463 8607 220 1.9 538.1 1.0X
[info] Running benchmark: Select 1 tinyint row (value = CAST(63 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 19742 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5910 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 tinyint row (value = CAST(63 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3891 3949 70 4.0 247.4 1.0X
[info] Parquet Vectorized (Pushdown) 1174 1182 16 13.4 74.6 3.3X
[info] Running benchmark: Select 10% tinyint rows (value < CAST(12 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 23622 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9787 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% tinyint rows (value < CAST(12 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4615 4724 100 3.4 293.4 1.0X
[info] Parquet Vectorized (Pushdown) 1924 1958 64 8.2 122.3 2.4X
[info] Running benchmark: Select 50% tinyint rows (value < CAST(63 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 38379 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 30411 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% tinyint rows (value < CAST(63 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7557 7676 108 2.1 480.5 1.0X
[info] Parquet Vectorized (Pushdown) 6011 6082 60 2.6 382.2 1.3X
[info] Running benchmark: Select 90% tinyint rows (value < CAST(114 AS tinyint))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 54810 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 54362 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% tinyint rows (value < CAST(114 AS tinyint)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10670 10962 361 1.5 678.4 1.0X
[info] Parquet Vectorized (Pushdown) 10693 10872 224 1.5 679.8 1.0X
[info] Running benchmark: Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 21078 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 21416 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as INT96 row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4156 4216 67 3.8 264.2 1.0X
[info] Parquet Vectorized (Pushdown) 4151 4283 89 3.8 263.9 1.0X
[info] Running benchmark: Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 25197 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 25234 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as INT96 rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4931 5039 71 3.2 313.5 1.0X
[info] Parquet Vectorized (Pushdown) 4923 5047 73 3.2 313.0 1.0X
[info] Running benchmark: Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 40851 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 40816 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as INT96 rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7972 8170 127 2.0 506.8 1.0X
[info] Parquet Vectorized (Pushdown) 8056 8163 92 2.0 512.2 1.0X
[info] Running benchmark: Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 56489 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 55908 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as INT96 rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 11111 11298 238 1.4 706.4 1.0X
[info] Parquet Vectorized (Pushdown) 11086 11182 66 1.4 704.8 1.0X
[info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 17925 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 5612 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as TIMESTAMP_MICROS row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3504 3585 69 4.5 222.8 1.0X
[info] Parquet Vectorized (Pushdown) 1119 1123 4 14.1 71.1 3.1X
[info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 22303 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 9942 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4365 4461 84 3.6 277.5 1.0X
[info] Parquet Vectorized (Pushdown) 1905 1988 101 8.3 121.1 2.3X
[info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 38138 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 30971 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7534 7628 101 2.1 479.0 1.0X
[info] Parquet Vectorized (Pushdown) 6010 6194 189 2.6 382.1 1.3X
[info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 54005 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 52469 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10649 10801 149 1.5 677.0 1.0X
[info] Parquet Vectorized (Pushdown) 10307 10494 310 1.5 655.3 1.0X
[info] Running benchmark: Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 18819 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 6081 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 3738 3764 28 4.2 237.6 1.0X
[info] Parquet Vectorized (Pushdown) 1190 1216 26 13.2 75.7 3.1X
[info] Running benchmark: Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 23198 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 10525 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(1572864)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 4586 4640 52 3.4 291.6 1.0X
[info] Parquet Vectorized (Pushdown) 2009 2105 70 7.8 127.7 2.3X
[info] Running benchmark: Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 39337 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 33023 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(7864320)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 7766 7868 82 2.0 493.8 1.0X
[info] Parquet Vectorized (Pushdown) 6404 6605 187 2.5 407.1 1.2X
[info] Running benchmark: Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776))
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 54512 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 53224 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < timestamp_seconds(14155776)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 10833 10902 55 1.5 688.8 1.0X
[info] Parquet Vectorized (Pushdown) 10499 10645 102 1.5 667.5 1.0X
[info] 00:27:18.540 WARN org.apache.spark.sql.catalyst.util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Select 1 row with 1 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 12 iterations, 2032 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 12 iterations, 2021 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 1 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 163 169 6 0.0 162707158.0 1.0X
[info] Parquet Vectorized (Pushdown) 162 168 5 0.0 162184547.0 1.0X
[info] Running benchmark: Select 1 row with 250 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 3930 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 4599 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 250 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 777 786 15 0.0 776511925.0 1.0X
[info] Parquet Vectorized (Pushdown) 903 920 23 0.0 902964783.0 0.9X
[info] Running benchmark: Select 1 row with 500 filters
[info] Running case: Parquet Vectorized
[info] Stopped after 5 iterations, 14782 ms
[info] Running case: Parquet Vectorized (Pushdown)
[info] Stopped after 5 iterations, 16974 ms
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_221-b11 on Linux 3.10.0-957.10.1.el7.x86_64
[info] Intel Core Processor (Broadwell, IBRS)
[info] Select 1 row with 500 filters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 2921 2956 28 0.0 2921416288.0 1.0X
[info] Parquet Vectorized (Pushdown) 3383 3395 10 0.0 3382576710.0 0.9X
[success] Total time: 6276 s (01:44:36), completed Jan 20, 2021 12:28:23 AM
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965
@h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want that, but currently it looks infeasible technically. Do you think that all infeasible things are low priority?
> I'm surprised (without criticism!) that this has a seemingly low priority
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799915
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38868/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-768790620
LGTM
The current PR looks good to me. However, based on the pervious experience, Parquet upgrade always causes various issues. We might revert the upgrade at the last minute.
@wangyum Could you create a 3.2.0 blocker JIRA? Before the release, we need to double check the unreleased/unresolved JIRAs/PRs of Parquet 1.11 and then decide whether we should upgrade/revert Parquet. At the same time, we should encourage the whole community to do the compatibility and performance tests for their production workloads, including both read and write code paths.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561449141
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
Wow. Then, it's a real bug. Thanks for confirmation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767335521
**[Test build #134487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306
-1 (non-binding) Please do not merge this into master, it breaks downstream applications due to mixed Avro 1.8.x vs 1.9.x transitive dependencies.
I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769488412
Thank you, @gatorsmile and @wangyum !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764181491
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38882/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767383036
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39073/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765892788
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38986/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767383036
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39073/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762167906
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134187/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561044576
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
Disable `parquet.page.write-checksum.enabled`, otherwise:
```
[info] - DataFrame reuse *** FAILED *** (1 second, 802 milliseconds)
[info] Decoded objects do not match expected objects:
[info] expected: WrappedArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
[info] actual: WrappedArray(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 2)
[info] assertnotnull(upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long"))
[info] +- upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long")
[info] +- getcolumnbyordinal(0, LongType) (QueryTest.scala:68)
```
This issue introduced by [PARQUET-1580](https://issues.apache.org/jira/browse/PARQUET-1580). cc @gszadovszky
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767335521
**[Test build #134487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763776712
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38868/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-715504346
@heuermh you're right, haven't considered this case before. Even if we shade Avro in Spark we may still have the Avro jars from Hive side which are of even lower version. I _think_ `parquet-avro` 1.10.1 can work with other parquet 1.11.x modules but maybe this is something we don't want to do anyways in order to not confuse Spark users.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799915
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38868/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764186208
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38884/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763817245
**[Test build #134282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764172517
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38884/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762064332
**[Test build #134187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762130255
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38772/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078
@sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x. Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762364158
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765903407
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38986/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765902815
**[Test build #134400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class BitwiseGet(left: Expression, right: Expression)`
* ` new RuntimeException(s\"class `$`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767404318
**[Test build #134487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134487/testReport)** for PR 26804 at commit [`72c52b6`](https://github.com/apache/spark/commit/72c52b64958340835e5a54b24aa68f201f4c15be).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-722984499
@sunchao I was revisiting this patch with the source compatible idea of the recent patch we worked for Hive and it seems that Parquet is fully source compatible with Avro 1.8.2-1.11.1 so this upgrade on Spark side should be less of a problem. Only issue is the dependency leaking you mention above.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763829863
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134282/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630458539
@heuermh thanks for the info. @dongjoon-hyun @wangyum any thought on this to move forward?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765907105
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769495501
Nice work @wangyum and all! is there anything else to be done in order to get the full page skipping feature with column indexes? looking at [PARQUET-1739](https://issues.apache.org/jira/browse/PARQUET-1739) I was under the impression that the vectorized path needs some more work.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762367843
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969
**[Test build #134296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767367295
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39073/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762064332
**[Test build #134187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561433379
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
So, are you saying that there is no breaking change, @wangyum ?
@srowen 's question is asking the reason why we need this change, isn't it?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
Could you add some comment about the reason why you disable it? It looks like a workaround to avoid Parquet-side performance regression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063666
>> Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
>
> #30517 is used for testing compatibility.
Thank you, @wangyum! As #31232 has been merged for Spark 3.2.0, I assume the target for this pull request is also version 3.2.0?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707592
**[Test build #134282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762615763
**[Test build #134211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r562959779
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
This looks dangerous. Also cc @bbraams
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559640730
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
Would this be a possibly breaking change to files written as Parquet? may be a dumb question.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969
**[Test build #134296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764206843
**[Test build #134301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134301/testReport)** for PR 26804 at commit [`a89c61d`](https://github.com/apache/spark/commit/a89c61d90cc145cea7e5c3df1200fb3ec1d7a3db).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763697839
Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764181491
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38882/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764187076
**[Test build #134296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134296/testReport)** for PR 26804 at commit [`802eb36`](https://github.com/apache/spark/commit/802eb369d3cada5a5dbc284febf91c4fc5b8dbcb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum closed pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #26804:
URL: https://github.com/apache/spark/pull/26804
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38883/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
It will change the data order, please seem https://github.com/apache/spark/pull/26804#discussion_r561044576.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663
@h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it.
Please feel free to open a working PR. Then, the community will welcome.
BTW, we are in Apache Spark community. For the other community issues, please ping them.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gszadovszky commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561694102
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
@wangyum, I've checked the code change of PARQUET-1580 (again) and still don't understand why it would cause such an issue. By disabling the CRC write you only achieve to not to write an optional field in the page headers. It should not impact any kind of ordering. If it really does it means that this ordering relies on some parameters that it shouldn't. It also means that any other potential change in the file metadata might impact this ordering.
Maybe I'm overlooking something in our code base so any comment is welcomed but if not I would suggest revisiting these unit tests.
Meanwhile, I am not experienced in Spark code so if you are fine with this workaround in a unit test I am not against it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764403535
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134301/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767355969
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39073/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764312009
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38887/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762088985
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38772/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306
-1 (non-binding) Please do not merge this into master, it breaks downstream applications due to mixed Avro 1.8.x vs 1.9.x transitive dependencies.
I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things, see pull request https://github.com/apache/spark/pull/27609 which was closed without merging).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764115800
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38883/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663
@h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it.
Please feel free to open a working PR. Then, the community will welcome.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r559357563
##########
File path: pom.xml
##########
@@ -2318,6 +2318,10 @@
<groupId>commons-pool</groupId>
<artifactId>commons-pool</artifactId>
</exclusion>
+ <exclusion>
+ <groupId>javax.annotation</groupId>
+ <artifactId>javax.annotation-api</artifactId>
+ </exclusion>
Review comment:
We do not need this, please see PARQUET-1497 for more details.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630366425
What's the status of this, if I may ask?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561444625
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
It will change the data order, please see https://github.com/apache/spark/pull/26804#discussion_r561044576.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762338175
**[Test build #134204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762043802
Thank you for reopening this, @wangyum .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762167906
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134187/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762367843
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gszadovszky commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561694102
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
@wangyum, I've checked the code change of PARQUET-1580 (again) and still don't understand why it would cause such an issue. By disabling the CRC write you only achieve to not to write an optional field in the page headers. It should not impact any kind of ordering. If it really does it means that this ordering relies on some parameters that it shouldn't. It also means that any other potential change in the file metadata might impact this ordering.
Maybe I'm overlooking something in our code base so any comment is welcomed but if not I would suggest revisiting these unit tests.
Meanwhile, I am not experienced in Spark code so if you are fine with this workaround in a unit test I am not against it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764334636
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38887/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630439172
The benchmark from @wangyum shows no regression from upgrading the Parquet version. Since Spark 3.0 will be almost released, we should consider to merge this into master so people can do more testing and have it as part of Spark 3.1.
I'll merge it into master once a new build is finished.
Thanks,
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763829863
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134282/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765907105
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630471244
@iemejia 's Avro PR (#27609) didn't pass Apache Spark UTs. And, according to his report, this Parquet PR seems to be blocked by Avro dependency upgrade. If we have a clean PR for Avro to pass all UTs (including Hive 1.2/2.3 profile), we may restart to review it.
BTW, FYI, there is no Apache Hive release supporting Avro 1.9.x.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764228690
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38887/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965
@h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want new Parquet, but currently it looks infeasible technically. Do you think that all infeasible things are low priority?
> I'm surprised (without criticism!) that this has a seemingly low priority
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38883/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764126022
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38883/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561926496
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
Thank you @gszadovszky The size is different if enable the CRC write:
```
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-0ae44ddf-40bb-4ba5-84af-ec8cec037847-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-166cdfbf-b19d-4d55-b4ea-fbad6bcac9df-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 462 Jan 21 22:20 part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-3c52b355-290a-4dd4-aad3-4bb2960ba3b8-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-4486173e-d650-4548-8da4-b95ae0305d8c-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-4c3786f4-2702-4f58-9604-c3deed68bc86-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-71bf8d51-95b0-43a8-969b-c28630f90066-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-78776231-370d-45c6-8520-67b94c33c697-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-a2a811aa-a495-4439-9daf-8c4b2cb258d5-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-bf745883-0b3a-4383-8669-7464833bfea8-c000.snappy.parquet
-rw-r--r-- 1 yumwang wheel 463 Jan 21 22:20 part-00001-d3a65d34-cd1a-434d-a86c-8ee0203b3bac-c000.snappy.parquet
yumwang@LM-SHC-16508156 1611238822602 % parquet-tools cat part-00001-23a0376d-1c51-480d-b7c6-a2d9a07de0e3-c000.snappy.parquet
a = 2
```
and we will order the file by size:
https://github.com/apache/spark/blob/8ed23ed499ec7745a8e9bdc4c4fb3200fdb6c3c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L609
Not sure if it caused by int overflow:
https://github.com/apache/parquet-mr/pull/647#discussion_r561914480
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764058969
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-713879155
@heuermh could you please clarify how version change in parquet-avro will affect downstream apps? it's just a test dependency and shouldn't leak avro right?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769481762
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630945817
> @dongjoon-hyun: Please feel free to open a working PR. Then, the community will welcome.
Sorry if my message came across as demanding. I'm not deeply involved in the community here (yet?), and neither in the respective code bases, but if someone as involved as @iemejia is stuck, I have little hope to make an impact in the current situation. The problem he outlines sounds like a very thorny issue that will need collaboration with other projects (HIVE, AVRO, PARQUET etc), and even knowing how OSS works, this seems like a problem on a scale that will require active maintainer involvement.
So coming back to what I wrote: I'm surprised (without criticism!) that this has a seemingly low priority, and I hope someone can find a way forward.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561433379
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
So, are you saying that there is no breaking change, @wangyum ?
@srowen 's question is asking the reason why we need this change, isn't it?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
Could you add some comment about the reason why you disable it? It looks like a workaround to avoid Parquet-side performance regression.
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1580: Disables page-level CRC checksums by default.
Review comment:
Wow. Then, it's a real bug. Thanks for confirmation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762619705
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134211/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063136
**[Test build #134297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134297/testReport)** for PR 26804 at commit [`8a50d56`](https://github.com/apache/spark/commit/8a50d565dd9e2ce38f4b91bdbb2a5e82dc80b80b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763424219
@cloud-fan @gatorsmile @srowen @dongjoon-hyun @HyukjinKwon @rdblue
It does not have the performance regression, do you have more comments?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-767415007
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134487/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764150469
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762601527
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum closed pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #26804:
URL: https://github.com/apache/spark/pull/26804
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630471244
@iemejia 's Avro PR (#27609) didn't pass Apache Spark UTs. And, according to his report, this Parquet PR seems to be blocked by Avro dependency upgrade. If we have a clean PR for Avro to pass all UTs (including Hive 1.2/2.3 profile), we may restart to review it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707592
**[Test build #134282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134282/testReport)** for PR 26804 at commit [`c9b4792`](https://github.com/apache/spark/commit/c9b479284a220074424a40c07af7ecd27085c5cd).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769489320
BTW, Apache Parquet 1.12 is also one of the candidate we can choose in Apache Spark 3.2.0 timeframe.
Apache Spark 1.12.0 RC1 vote started already.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561677113
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
Seems better to test both.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-714729078
@sunchao Until `spark.driver.userClassPathFirst`/`spark.executor.userClassPathFirst` are no longer (Experimental), or dependencies in Spark are shaded properly, having Avro 1.8.x on the Spark runtime classpath will cause runtime compatibility exceptions for downstream apps that use `parquet-avro` 1.11.x or newer, which depend on Avro 1.9.x.
Or are you suggesting downstream apps use `parquet-avro` version 1.10.1 at the same time Spark depends on e.g. `parquet-[column,common,encoding,format-structures,hadoop,jackson]` 1.11.x or newer? I don't know if that is possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561447322
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
Looking at the original PR, I think the change should be backward-compatible (`map` annotation can still be handled on the read path).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762573083
**[Test build #134211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134211/testReport)** for PR 26804 at commit [`4efac50`](https://github.com/apache/spark/commit/4efac50dc441838fe5521d4b94a2a4870ad456c5).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
2. Writing out checksums has minimal performance impact.
3. Do we really need this feature? I haven't seen Spark SQL users request this feature. This change just disable it by default, users can still enable this feature.
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
2. Writing out checksums has minimal performance impact.
3. Do we really need this feature? I haven't seen Spark SQL users request this feature before. This change just disable it by default, users can still enable this feature.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764403535
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134301/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762151755
**[Test build #134187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134187/testReport)** for PR 26804 at commit [`b5101d2`](https://github.com/apache/spark/commit/b5101d20850b7c3ddc03cece8088a7b34b683084).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762619705
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134211/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
heuermh edited a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630453306
-1 (non-binding) Please do not merge this into master, it breaks downstream applications due to Avro 1.8.x vs 1.9.x transitive dependencies.
I believe this should be blocked on a dependency upgrade to Avro 1.9.x (which in turn is blocked on other things).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] heuermh commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
heuermh commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764063666
>> Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
>
> #30517 is used for testing compatibility.
Thank you, @wangyum! As #31232 has been merged for Spark 3.2.0, I assume the target for this pull request is also version 3.2.0?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763707557
> Sorry if I have missed some conversation, how does this pull request compare to #30517 (which bumps Parquet and Avro) and #31232 (which bumps only Avro)?
#30517 is used for testing compatibility.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762130255
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38772/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-723233104
@iemejia yes it is the issue like @heuermh mentioned above that we need to be careful with, and which makes upgrading Hive necessary.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764106286
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38882/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769662814
@sunchao https://github.com/apache/spark/pull/31393
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561677113
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -759,7 +759,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
nullable = true))),
"""message root {
| optional group f1 (MAP) {
- | repeated group map (MAP_KEY_VALUE) {
+ | repeated group key_value (MAP_KEY_VALUE) {
Review comment:
Seems better to test both.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-763799875
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38868/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564112793
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
1. Disable it to fix this regression: https://github.com/apache/spark/pull/26804#pullrequestreview-572328921.
2. Writing out checksums has minimal performance impact.
3. Do we really need this feature? I haven't seen Spark SQL users request this feature before. This change just disable it by default, users can still enable this feature.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] bbraams commented on a change in pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
bbraams commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r564511619
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
##########
@@ -127,6 +127,9 @@ class ParquetFileFormat
conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
}
+ // PARQUET-1746: Disables page-level CRC checksums by default.
+ conf.setBooleanIfUnset(ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED, false)
Review comment:
I see it's been addressed in https://github.com/apache/spark/pull/26804/commits/72c52b64958340835e5a54b24aa68f201f4c15be, thanks for the quick fix @wangyum! 👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762416314
**[Test build #134204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134204/testReport)** for PR 26804 at commit [`4e257c4`](https://github.com/apache/spark/commit/4e257c43895d36f0d5630cc735fb56642470b26d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-764139405
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
Posted by GitBox <gi...@apache.org>.
h-vetinari commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-630740908
I'm a bit surprised that upgrading parquet has such a low priority, especially with many important features like column indexes & cleaning up the timestamp situation/compatibility (even though I get the avro-situation is complicated). Hope someone can find a way forward.
> BTW, FYI, there is no Apache Hive release supporting Avro 1.9.x.
There are open patches here: https://issues.apache.org/jira/browse/HIVE-21737 (also by @iemejia, open for a year already).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-769490870
Thank you @dongjoon-hyun I will evaluate Parquet 1.12 soon.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-762111277
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38772/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #26804:
URL: https://github.com/apache/spark/pull/26804#issuecomment-765888093
**[Test build #134400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134400/testReport)** for PR 26804 at commit [`eb1c95e`](https://github.com/apache/spark/commit/eb1c95ee59464167cb50591b0110e7f3f19864a8).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org