You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/11/07 15:05:31 UTC
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/22965
[SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions
## What changes were proposed in this pull request?
1. OrcReadBenchmark is under hive module, so the way to run it should be
```
build/sbt "hive/test:runMain <this class>"
```
2. The benchmark "String with Nulls Scan" should be with case "String with Nulls Scan(5%/50%/95%)", not "(0.05%/0.5%/0.95%)"
3. Add the null value percentages in the test case names of DataSourceReadBenchmark, for the benchmark "String with Nulls Scan" .
## How was this patch tested?
Re-run benchmarks
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark fixHiveOrcReadBenchmark
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22965.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22965
----
commit f282331d12975687391a7648aacde19a58774936
Author: Gengliang Wang <ge...@...>
Date: 2018-11-07T13:03:05Z
fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22965
Hi, @gengliangwang .
Could you review and merge https://github.com/gengliangwang/spark/pull/1 ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22965
@dongjoon-hyun sure, done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98553/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231795384
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 14108 / 14263 1.1 896.9 1.0X
+SQL Json 5477 / 5509 2.9 348.2 2.6X
+SQL Parquet Vectorized 115 / 125 137.1 7.3 122.9X
+SQL Parquet MR 1318 / 1332 11.9 83.8 10.7X
+SQL ORC Vectorized 150 / 159 104.9 9.5 94.1X
+SQL ORC Vectorized with copy 206 / 208 76.4 13.1 68.5X
+SQL ORC MR 1072 / 1075 14.7 68.1 13.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 138 / 152 114.0 8.8 1.0X
+ParquetReader Vectorized -> Row 80 / 87 197.2 5.1 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 14495 / 14507 1.1 921.6 1.0X
+SQL Json 5615 / 5668 2.8 357.0 2.6X
+SQL Parquet Vectorized 147 / 154 107.4 9.3 98.9X
+SQL Parquet MR 1431 / 1454 11.0 91.0 10.1X
+SQL ORC Vectorized 170 / 175 92.4 10.8 85.1X
+SQL ORC Vectorized with copy 223 / 228 70.6 14.2 65.1X
+SQL ORC MR 1187 / 1197 13.2 75.5 12.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 190 / 219 82.8 12.1 1.0X
+ParquetReader Vectorized -> Row 165 / 169 95.2 10.5 1.1X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16105 / 16214 1.0 1023.9 1.0X
+SQL Json 6289 / 6291 2.5 399.8 2.6X
+SQL Parquet Vectorized 142 / 148 111.0 9.0 113.6X
+SQL Parquet MR 1797 / 1801 8.8 114.2 9.0X
+SQL ORC Vectorized 232 / 238 67.9 14.7 69.5X
+SQL ORC Vectorized with copy 237 / 242 66.5 15.0 68.1X
+SQL ORC MR 1309 / 1409 12.0 83.2 12.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 181 / 225 87.0 11.5 1.0X
+ParquetReader Vectorized -> Row 180 / 186 87.4 11.4 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 20128 / 20682 0.8 1279.7 1.0X
+SQL Json 8277 / 8279 1.9 526.3 2.4X
+SQL Parquet Vectorized 198 / 211 79.3 12.6 101.5X
+SQL Parquet MR 1788 / 1816 8.8 113.7 11.3X
+SQL ORC Vectorized 273 / 290 57.6 17.4 73.7X
--- End diff --
Yes 👍
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98601/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98553/testReport)** for PR 22965 at commit [`f282331`](https://github.com/apache/spark/commit/f282331d12975687391a7648aacde19a58774936).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98576/testReport)** for PR 22965 at commit [`3067a6d`](https://github.com/apache/spark/commit/3067a6d1f63c93b4295425d90e5894d27c840995).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98601/testReport)** for PR 22965 at commit [`b204638`](https://github.com/apache/spark/commit/b204638bdab7aca0676d81fc348a86b32222603e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98576/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22965
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98553/testReport)** for PR 22965 at commit [`f282331`](https://github.com/apache/spark/commit/f282331d12975687391a7648aacde19a58774936).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231793092
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 14108 / 14263 1.1 896.9 1.0X
+SQL Json 5477 / 5509 2.9 348.2 2.6X
+SQL Parquet Vectorized 115 / 125 137.1 7.3 122.9X
+SQL Parquet MR 1318 / 1332 11.9 83.8 10.7X
+SQL ORC Vectorized 150 / 159 104.9 9.5 94.1X
+SQL ORC Vectorized with copy 206 / 208 76.4 13.1 68.5X
+SQL ORC MR 1072 / 1075 14.7 68.1 13.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 138 / 152 114.0 8.8 1.0X
+ParquetReader Vectorized -> Row 80 / 87 197.2 5.1 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 14495 / 14507 1.1 921.6 1.0X
+SQL Json 5615 / 5668 2.8 357.0 2.6X
+SQL Parquet Vectorized 147 / 154 107.4 9.3 98.9X
+SQL Parquet MR 1431 / 1454 11.0 91.0 10.1X
+SQL ORC Vectorized 170 / 175 92.4 10.8 85.1X
+SQL ORC Vectorized with copy 223 / 228 70.6 14.2 65.1X
+SQL ORC MR 1187 / 1197 13.2 75.5 12.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 190 / 219 82.8 12.1 1.0X
+ParquetReader Vectorized -> Row 165 / 169 95.2 10.5 1.1X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16105 / 16214 1.0 1023.9 1.0X
+SQL Json 6289 / 6291 2.5 399.8 2.6X
+SQL Parquet Vectorized 142 / 148 111.0 9.0 113.6X
+SQL Parquet MR 1797 / 1801 8.8 114.2 9.0X
+SQL ORC Vectorized 232 / 238 67.9 14.7 69.5X
+SQL ORC Vectorized with copy 237 / 242 66.5 15.0 68.1X
+SQL ORC MR 1309 / 1409 12.0 83.2 12.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 181 / 225 87.0 11.5 1.0X
+ParquetReader Vectorized -> Row 180 / 186 87.4 11.4 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 20128 / 20682 0.8 1279.7 1.0X
+SQL Json 8277 / 8279 1.9 526.3 2.4X
+SQL Parquet Vectorized 198 / 211 79.3 12.6 101.5X
+SQL Parquet MR 1788 / 1816 8.8 113.7 11.3X
+SQL ORC Vectorized 273 / 290 57.6 17.4 73.7X
--- End diff --
Now, the result looks normal..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98576/testReport)** for PR 22965 at commit [`3067a6d`](https://github.com/apache/spark/commit/3067a6d1f63c93b4295425d90e5894d27c840995).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22965
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98586/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231608634
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 15974 / 16222 1.0 1015.6 1.0X
+SQL Json 5917 / 6174 2.7 376.2 2.7X
+SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X
+SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X
+SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X
+SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X
+SQL ORC MR 1095 / 1143 14.4 69.6 14.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X
+ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16394 / 16643 1.0 1042.3 1.0X
+SQL Json 6014 / 6020 2.6 382.4 2.7X
+SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X
+SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X
+SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X
+SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X
+SQL ORC MR 1185 / 1187 13.3 75.3 13.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X
+ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 17168 / 17306 0.9 1091.5 1.0X
+SQL Json 6167 / 6180 2.6 392.1 2.8X
+SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X
+SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X
+SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X
+SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X
+SQL ORC MR 1287 / 1388 12.2 81.8 13.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X
+ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 21253 / 21542 0.7 1351.2 1.0X
+SQL Json 8208 / 8209 1.9 521.9 2.6X
+SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X
+SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X
+SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X
--- End diff --
@gengliangwang . Just a surprise, `86.2X -> 6.5X`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4830/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98586/testReport)** for PR 22965 at commit [`3067a6d`](https://github.com/apache/spark/commit/3067a6d1f63c93b4295425d90e5894d27c840995).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4816/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22965
@dongjoon-hyun @yucai
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231908870
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -32,9 +32,11 @@ import org.apache.spark.sql.types._
* Benchmark to measure ORC read performance.
* {{{
* To run this benchmark:
- * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
- * 2. build/sbt "sql/test:runMain <this class>"
- * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * 1. without sbt: bin/spark-submit --class <this class>
+ * --jars <catalyst test jar>,<core test jar>,<sql jar>,<hive-exec jar>,<spark-hive jar>
--- End diff --
Thanks @gengliangwang ! I also find some other `Benchmark`'s jar is wrong, for example:
```
UDTSerializationBenchmark:
* 1. without sbt: bin/spark-submit --class <this class> <spark mllib test jar>
```
I will make a PR to update them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231610295
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 15974 / 16222 1.0 1015.6 1.0X
+SQL Json 5917 / 6174 2.7 376.2 2.7X
+SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X
+SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X
+SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X
+SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X
+SQL ORC MR 1095 / 1143 14.4 69.6 14.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X
+ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16394 / 16643 1.0 1042.3 1.0X
+SQL Json 6014 / 6020 2.6 382.4 2.7X
+SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X
+SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X
+SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X
+SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X
+SQL ORC MR 1185 / 1187 13.3 75.3 13.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X
+ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 17168 / 17306 0.9 1091.5 1.0X
+SQL Json 6167 / 6180 2.6 392.1 2.8X
+SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X
+SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X
+SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X
+SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X
+SQL ORC MR 1287 / 1388 12.2 81.8 13.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X
+ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 21253 / 21542 0.7 1351.2 1.0X
+SQL Json 8208 / 8209 1.9 521.9 2.6X
+SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X
+SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X
+SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X
--- End diff --
Could you regenerate the result again?
I think this might be your Macbook issue at this moment of time.
Please see line 71. It's faster that this. (Usually, line 71 is slower than this due to copying).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231766852
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 15974 / 16222 1.0 1015.6 1.0X
+SQL Json 5917 / 6174 2.7 376.2 2.7X
+SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X
+SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X
+SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X
+SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X
+SQL ORC MR 1095 / 1143 14.4 69.6 14.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X
+ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16394 / 16643 1.0 1042.3 1.0X
+SQL Json 6014 / 6020 2.6 382.4 2.7X
+SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X
+SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X
+SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X
+SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X
+SQL ORC MR 1185 / 1187 13.3 75.3 13.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X
+ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 17168 / 17306 0.9 1091.5 1.0X
+SQL Json 6167 / 6180 2.6 392.1 2.8X
+SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X
+SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X
+SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X
+SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X
+SQL ORC MR 1287 / 1388 12.2 81.8 13.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X
+ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 21253 / 21542 0.7 1351.2 1.0X
+SQL Json 8208 / 8209 1.9 521.9 2.6X
+SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X
+SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X
+SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X
--- End diff --
Done in https://github.com/apache/spark/pull/22965/commits/3067a6d1f63c93b4295425d90e5894d27c840995 .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r232001105
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+SQL CSV 26366 / 26562 0.6 1676.3 1.0X
--- End diff --
Hi, @HyukjinKwon , @MaxGekk , @cloud-fan , @peter-toth
This is not related to this PR. CSV shows a consistent performance regression (about 10%) thoughout all benchmark cases. The other data sources show reasonable numbers for all types.
The baseline is generated on Oct 11st. The followings are the suspects.
1. ee03f760b3 [SPARK-25955][TEST] Porting JSON tests for CSV functions
1. 94de5609be [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method
1. 3b4556745e [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
1. 1e6c1d8bfb [SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline mode
1. c7eadb5e66 [SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter
1. 39872af882 [SPARK-25684][SQL] Organize header related codes in CSV datasource
1. 46fe40838a [SPARK-25669][SQL] Check CSV header only when it exists
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231541613
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -266,8 +268,9 @@ object OrcReadBenchmark extends BenchmarkBase with SQLHelper {
s"SELECT IF(RAND(1) < $fractionOfNulls, NULL, CAST(id as STRING)) AS c1, " +
s"IF(RAND(2) < $fractionOfNulls, NULL, CAST(id as STRING)) AS c2 FROM t1"))
--- End diff --
@maropu It is trivial, but why it is RAND(2) here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4839/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22965
Thank you, @gengliangwang . Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231549251
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -32,9 +32,11 @@ import org.apache.spark.sql.types._
* Benchmark to measure ORC read performance.
* {{{
* To run this benchmark:
- * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
- * 2. build/sbt "sql/test:runMain <this class>"
- * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * 1. without sbt: bin/spark-submit --class <this class>
+ * --jars <catalyst test jar>,<core test jar>,<sql jar>,<hive-exec jar>,<spark-hive jar>
--- End diff --
The jars here are built by sbt.
I am surprise that 5 jars are required.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98586/testReport)** for PR 22965 at commit [`3067a6d`](https://github.com/apache/spark/commit/3067a6d1f63c93b4295425d90e5894d27c840995).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22965#discussion_r231765680
--- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ---
@@ -2,268 +2,268 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 21508 / 22112 0.7 1367.5 1.0X
-SQL Json 8705 / 8825 1.8 553.4 2.5X
-SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X
-SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X
-SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X
-SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X
-SQL ORC MR 1448 / 1492 10.9 92.0 14.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 15974 / 16222 1.0 1015.6 1.0X
+SQL Json 5917 / 6174 2.7 376.2 2.7X
+SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X
+SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X
+SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X
+SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X
+SQL ORC MR 1095 / 1143 14.4 69.6 14.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X
-ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X
+ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X
+ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 23282 / 23312 0.7 1480.2 1.0X
-SQL Json 9187 / 9189 1.7 584.1 2.5X
-SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X
-SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X
-SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X
-SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X
-SQL ORC MR 1541 / 1549 10.2 98.0 15.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 16394 / 16643 1.0 1042.3 1.0X
+SQL Json 6014 / 6020 2.6 382.4 2.7X
+SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X
+SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X
+SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X
+SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X
+SQL ORC MR 1185 / 1187 13.3 75.3 13.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X
-ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X
+ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X
+ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 24990 / 25012 0.6 1588.8 1.0X
-SQL Json 9837 / 9865 1.6 625.4 2.5X
-SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X
-SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X
-SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X
-SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X
-SQL ORC MR 1667 / 1674 9.4 106.0 15.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 17168 / 17306 0.9 1091.5 1.0X
+SQL Json 6167 / 6180 2.6 392.1 2.8X
+SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X
+SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X
+SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X
+SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X
+SQL ORC MR 1287 / 1388 12.2 81.8 13.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X
-ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X
+ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X
+ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-SQL CSV 32537 / 32554 0.5 2068.7 1.0X
-SQL Json 12610 / 12668 1.2 801.7 2.6X
-SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X
-SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X
-SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X
-SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X
-SQL ORC MR 1797 / 1819 8.8 114.3 18.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+SQL CSV 21253 / 21542 0.7 1351.2 1.0X
+SQL Json 8208 / 8209 1.9 521.9 2.6X
+SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X
+SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X
+SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X
--- End diff --
OK, let me re-run it.
Thanks for the check!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22965
**[Test build #98601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98601/testReport)** for PR 22965 at commit [`b204638`](https://github.com/apache/spark/commit/b204638bdab7aca0676d81fc348a86b32222603e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark/DataSo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22965
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4849/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org