You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/09/20 16:06:54 UTC
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22501
[SPARK-25492][TEST] Refactor WideSchemaBenchmark to use main method
## What changes were proposed in this pull request?
Refactor `WideSchemaBenchmark` to use main method.
Generate benchmark result:
```sh
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.WideSchemaBenchmark"
```
## How was this patch tested?
manual tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25492
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22501.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22501
----
commit f56b73223fbf765e408d9aef6565a2318f4836e3
Author: Yuming Wang <yu...@...>
Date: 2018-09-20T16:04:30Z
Refactor WideSchemaBenchmark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97534/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
@cloud-fan After updating on EC2, almost ratio and values looks more stable and reasonable for now. The following two are noticeable changes, but it looks like Parquet Writer improvement (instead of regression).
**1. Read/Write ratio is reverted (`0.8` -> `1.7`)**
I'm not sure but Parquet writer for `deep
```scala
- 128 x 8 deep x 1000 rows (read parquet) 69 / 74 1.4 693.9 0.2X
- 128 x 8 deep x 1000 rows (write parquet) 78 / 83 1.3 777.7 0.2X
+ 128 x 8 deep x 1000 rows (read parquet) 351 / 379 0.3 3510.3 0.1X
+ 128 x 8 deep x 1000 rows (write parquet) 199 / 203 0.5 1988.3 0.2X
```
**2. Read/Write ratio is changed noticeably (`4.6` -> `8.3`)**
```scala
- 1024 x 11 deep x 100 rows (read parquet) 426 / 433 0.2 4263.7 0.0X
- 1024 x 11 deep x 100 rows (write parquet) 91 / 98 1.1 913.5 0.1X
+ 1024 x 11 deep x 100 rows (read parquet) 2063 / 2078 0.0 20629.2 0.0X
+ 1024 x 11 deep x 100 rows (write parquet) 248 / 266 0.4 2475.1 0.1X
```
Since this is the first attempt to track this and the previous result is too old, there exists some obvious limitation during comparison. From Spark 2.4.0, we can get a consistent compasison instead of `different` personal mac.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22501
thank you guys for refreshing the benchmarks and results! It's very helpful.
If possible, can we post the perf regressions we found in the umbrella JIRA? Then people can see if the perf regression is reasonable(if we have addressed it) or investigate how the regression was introduced.
Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r223195740
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
Change here because: https://github.com/apache/spark/pull/22443#discussion_r221181428
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97642/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97056/testReport)** for PR 22501 at commit [`e6f39f3`](https://github.com/apache/spark/commit/e6f39f36b5d806f1afcea980ba43d544dadbe35f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97665/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r219725654
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala ---
@@ -17,22 +17,19 @@
package org.apache.spark.sql
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils}
/**
* Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- * build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to "sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --
Thanks @dongjoon-hyun. Actually I'm waiting for https://github.com/apache/spark/pull/22484. I want to move `withTempDir()` to `RunBenchmarkWithCodegen.scala`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97642/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3306/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4150/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r223195081
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
This looks like irrelevant pig-back.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22501
I guess it's related with pip packaging tho.
```
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
'Programming Language :: Python :: Implementation :: PyPy']
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
return distutils.core.setup(**attrs)
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
self.install_for_development()
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
self.process_distribution(None, self.dist, not self.no_deps)
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
self.install_egg_scripts(dist)
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
script_text = strm.read()
File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
```
It's from setup.py
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97056/testReport)** for PR 22501 at commit [`e6f39f3`](https://github.com/apache/spark/commit/e6f39f36b5d806f1afcea980ba43d544dadbe35f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97627/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97644/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
Retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97534/testReport)** for PR 22501 at commit [`82e2367`](https://github.com/apache/spark/commit/82e2367a203ffc03dea9bf826a5085059e1391ed).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r223196145
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
IIUC, @HyukjinKwon meant `when you need to touch this file`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22501
seems jenkins is broken, cc @shaneknapp
```
Command "/tmp/tmp.JfFHaoRFPU/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps" failed with error code 1 in /home/jenkins/workspace/SparkPullRequestBuilder/python/
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Cleaning up temporary directory - /tmp/tmp.JfFHaoRFPU
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ; received return code 1
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
Hi, @wangyum . I ran the test on EC2 `r3.xlarge`, too. It looks more stable than this.
Could you review and merge https://github.com/wangyum/spark/pull/19 ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22501
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:
https://github.com/apache/spark/pull/22501
@cloud-fan -- pip isn't broken... the actual error is found right above what you cut and pasted:
` UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)`
i won't be able to look any deeper in to this until at least tomorrow at the earliest.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22501
Thanks, I found `0xc2` in `docker-image-tool.sh`. I will put my finding into #22782
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4153/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226769745
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
For this part, right, @rdblue . I guess so.
After merging EC2 result to @wangyum 's PR, I'll compare the numbers one by one once again.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22501
Is [this](https://github.com/apache/spark/pull/22748#issuecomment-431512558) the oldest test failure related to this type of failure?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96369/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22501
I am looking at each commit from the latest to old at https://github.com/apache/spark/commits/master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97644/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97627/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97627/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r223202914
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
I am worried that I will forget it after a long time, so I am changing this time. I should revert it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #96369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96369/testReport)** for PR 22501 at commit [`f56b732`](https://github.com/apache/spark/commit/f56b73223fbf765e408d9aef6565a2318f4836e3).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r219724989
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala ---
@@ -17,22 +17,19 @@
package org.apache.spark.sql
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils}
/**
* Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- * build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to "sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --
Could you fix doc generation failure?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226742168
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
The following [EC2 result](https://github.com/wangyum/spark/pull/19) shows the consistent ratio like Spark 2.1.0. The result on Mac seemed to be unstable for some unknown reason like https://github.com/apache/spark/pull/22501#discussion_r226440992.
```scala
1 cols x 100000 rows (read parquet) 61 / 70 1.6 610.2 0.6X
1 cols x 100000 rows (write parquet) 209 / 233 0.5 2086.1 0.2X
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97644/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4137/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226520120
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
May be a parquet issue. I found that the binary write performance is a little worse after upgrading to parquet 1.10.0: https://github.com/apache/parquet-mr/pull/505. I will verify it later.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97665/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4089/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97534/testReport)** for PR 22501 at commit [`82e2367`](https://github.com/apache/spark/commit/82e2367a203ffc03dea9bf826a5085059e1391ed).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97665/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226740901
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
+100 cols x 1000 rows (read in-mem) 26 / 28 3.9 255.8 0.9X
+100 cols x 1000 rows (exec in-mem) 32 / 35 3.1 319.3 0.7X
+100 cols x 1000 rows (read parquet) 45 / 52 2.2 445.9 0.5X
+100 cols x 1000 rows (write parquet) 275 / 536 0.4 2746.1 0.1X
+2500 cols x 40 rows (read in-mem) 261 / 434 0.4 2607.3 0.1X
+2500 cols x 40 rows (exec in-mem) 624 / 701 0.2 6240.5 0.0X
+2500 cols x 40 rows (read parquet) 196 / 301 0.5 1963.4 0.1X
+2500 cols x 40 rows (write parquet) 687 / 1049 0.1 6870.6 0.0X
--- End diff --
FYI, this large gap was removed at EC2 result.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226439834
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
This might be a little regression on Parquet writer from Spark 2.1.0 (SPARK-17335).
cc @cloud-fan and @gatorsmile , @rdblue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226440992
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
+100 cols x 1000 rows (read in-mem) 26 / 28 3.9 255.8 0.9X
+100 cols x 1000 rows (exec in-mem) 32 / 35 3.1 319.3 0.7X
+100 cols x 1000 rows (read parquet) 45 / 52 2.2 445.9 0.5X
+100 cols x 1000 rows (write parquet) 275 / 536 0.4 2746.1 0.1X
+2500 cols x 40 rows (read in-mem) 261 / 434 0.4 2607.3 0.1X
+2500 cols x 40 rows (exec in-mem) 624 / 701 0.2 6240.5 0.0X
+2500 cols x 40 rows (read parquet) 196 / 301 0.5 1963.4 0.1X
+2500 cols x 40 rows (write parquet) 687 / 1049 0.1 6870.6 0.0X
--- End diff --
The difference between `best` and `average` is too high in line 32 and line 33.
I'll try to run this on EC2, too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #97642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97642/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
Thank you, @wangyum and all!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97056/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226765772
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
@dongjoon-hyun, so you are saying that it doesn't appear that there is a performance regression, right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22501
Retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22501
Yup, I made a fix https://github.com/apache/spark/pull/22782
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r224985471
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
My point was that there's no point of checking `null` below from my cursory look. If there's no chance that it becomes `null`, we can leave it `Some` and remove `null` check below.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22501
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22501
Thanks, when it was successful, this is a part of log from [this](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97378/consoleText)
```
copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
Writing pyspark-3.0.0.dev0/setup.cfg
Creating tar archive
removing 'pyspark-3.0.0.dev0' (and everything under it)
Installing dist into virtual env
Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
mkl-random 1.0.1 requires cython, which is not installed.
Installing collected packages: py4j, pyspark
Running setup.py develop for pyspark
Successfully installed py4j-0.10.7 pyspark
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Run basic sanity check on pip installed version with spark-submit
```
Now, we are seeing the following
```
copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
Writing pyspark-3.0.0.dev0/setup.cfg
Creating tar archive
removing 'pyspark-3.0.0.dev0' (and everything under it)
Installing dist into virtual env
Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
mkl-random 1.0.1 requires cython, which is not installed.
Installing collected packages: py4j, pyspark
Running setup.py develop for pyspark
Complete output from command /tmp/tmp.EWtmCOYUBn/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps:
running develop
running egg_info
writing dependency_links to pyspark.egg-info/dependency_links.txt
writing pyspark.egg-info/PKG-INFO
writing requirements to pyspark.egg-info/requires.txt
writing top-level names to pyspark.egg-info/top_level.txt
Could not import pypandoc - required to package PySpark
package init file 'deps/bin/__init__.py' not found (or not a regular file)
package init file 'deps/jars/__init__.py' not found (or not a regular file)
package init file 'pyspark/python/pyspark/__init__.py' not found (or not a regular file)
package init file 'lib/__init__.py' not found (or not a regular file)
package init file 'deps/data/__init__.py' not found (or not a regular file)
package init file 'deps/licenses/__init__.py' not found (or not a regular file)
package init file 'deps/examples/__init__.py' not found (or not a regular file)
reading manifest file 'pyspark.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*.py[cod]' found anywhere in distribution
warning: no previously-included files matching '__pycache__' found anywhere in distribution
warning: no previously-included files matching '.DS_Store' found anywhere in distribution
writing manifest file 'pyspark.egg-info/SOURCES.txt'
running build_ext
Creating /tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/pyspark.egg-link (link to .)
Adding pyspark 3.0.0.dev0 to easy-install.pth file
Installing load-spark-env.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-submit script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-class.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing beeline.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing find-spark-home.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing run-example script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-shell2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing pyspark script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing sparkR script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-sql script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-submit.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-shell script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing beeline script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-submit2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing find-spark-home script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing sparkR.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing run-example.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing sparkR2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-shell.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-sql.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Installing spark-class2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
'Programming Language :: Python :: Implementation :: PyPy']
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
return distutils.core.setup(**attrs)
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
self.install_for_development()
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
self.process_distribution(None, self.dist, not self.no_deps)
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
self.install_egg_scripts(dist)
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
script_text = strm.read()
File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
----------------------------------------
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226442573
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
+100 cols x 1000 rows (read in-mem) 26 / 28 3.9 255.8 0.9X
+100 cols x 1000 rows (exec in-mem) 32 / 35 3.1 319.3 0.7X
+100 cols x 1000 rows (read parquet) 45 / 52 2.2 445.9 0.5X
+100 cols x 1000 rows (write parquet) 275 / 536 0.4 2746.1 0.1X
+2500 cols x 40 rows (read in-mem) 261 / 434 0.4 2607.3 0.1X
+2500 cols x 40 rows (exec in-mem) 624 / 701 0.2 6240.5 0.0X
+2500 cols x 40 rows (read parquet) 196 / 301 0.5 1963.4 0.1X
+2500 cols x 40 rows (write parquet) 687 / 1049 0.1 6870.6 0.0X
+
+
+================================================================================================
+wide shallowly nested struct field read and write
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 wide x 100000 rows (read in-mem) 15 / 17 6.6 151.0 1.0X
-1 wide x 100000 rows (exec in-mem) 20 / 22 5.1 196.6 0.8X
-1 wide x 100000 rows (read parquet) 59 / 63 1.7 592.8 0.3X
-1 wide x 100000 rows (write parquet) 81 / 87 1.2 814.6 0.2X
-100 wide x 1000 rows (read in-mem) 21 / 25 4.8 208.7 0.7X
-100 wide x 1000 rows (exec in-mem) 72 / 81 1.4 718.5 0.2X
-100 wide x 1000 rows (read parquet) 75 / 85 1.3 752.6 0.2X
-100 wide x 1000 rows (write parquet) 88 / 95 1.1 876.7 0.2X
-2500 wide x 40 rows (read in-mem) 28 / 34 3.5 282.2 0.5X
-2500 wide x 40 rows (exec in-mem) 1269 / 1284 0.1 12688.1 0.0X
-2500 wide x 40 rows (read parquet) 549 / 578 0.2 5493.4 0.0X
-2500 wide x 40 rows (write parquet) 96 / 104 1.0 959.1 0.2X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 wide x 100000 rows (read in-mem) 23 / 42 4.4 226.2 1.0X
+1 wide x 100000 rows (exec in-mem) 29 / 53 3.5 288.5 0.8X
+1 wide x 100000 rows (read parquet) 93 / 102 1.1 928.2 0.2X
+1 wide x 100000 rows (write parquet) 201 / 222 0.5 2009.6 0.1X
+100 wide x 1000 rows (read in-mem) 42 / 55 2.4 421.8 0.5X
+100 wide x 1000 rows (exec in-mem) 55 / 113 1.8 547.0 0.4X
+100 wide x 1000 rows (read parquet) 139 / 263 0.7 1390.6 0.2X
+100 wide x 1000 rows (write parquet) 245 / 338 0.4 2450.9 0.1X
+2500 wide x 40 rows (read in-mem) 51 / 72 2.0 511.7 0.4X
+2500 wide x 40 rows (exec in-mem) 265 / 303 0.4 2654.8 0.1X
+2500 wide x 40 rows (read parquet) 1285 / 1339 0.1 12845.1 0.0X
+2500 wide x 40 rows (write parquet) 238 / 262 0.4 2378.8 0.1X
+
+================================================================================================
+deeply nested struct field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
deeply nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 deep x 100000 rows (read in-mem) 14 / 16 7.0 143.8 1.0X
-1 deep x 100000 rows (exec in-mem) 17 / 19 5.9 169.7 0.8X
-1 deep x 100000 rows (read parquet) 33 / 35 3.1 327.0 0.4X
-1 deep x 100000 rows (write parquet) 79 / 84 1.3 786.9 0.2X
-100 deep x 1000 rows (read in-mem) 21 / 24 4.7 211.3 0.7X
-100 deep x 1000 rows (exec in-mem) 221 / 235 0.5 2214.5 0.1X
-100 deep x 1000 rows (read parquet) 1928 / 1952 0.1 19277.1 0.0X
-100 deep x 1000 rows (write parquet) 91 / 96 1.1 909.5 0.2X
-250 deep x 400 rows (read in-mem) 57 / 61 1.8 567.1 0.3X
-250 deep x 400 rows (exec in-mem) 1329 / 1385 0.1 13291.8 0.0X
-250 deep x 400 rows (read parquet) 36563 / 36750 0.0 365630.2 0.0X
-250 deep x 400 rows (write parquet) 126 / 130 0.8 1262.0 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 deep x 100000 rows (read in-mem) 20 / 24 5.1 197.9 1.0X
+1 deep x 100000 rows (exec in-mem) 23 / 28 4.4 227.8 0.9X
+1 deep x 100000 rows (read parquet) 50 / 58 2.0 500.1 0.4X
+1 deep x 100000 rows (write parquet) 195 / 219 0.5 1945.1 0.1X
+100 deep x 1000 rows (read in-mem) 39 / 57 2.5 393.1 0.5X
+100 deep x 1000 rows (exec in-mem) 480 / 556 0.2 4795.7 0.0X
+100 deep x 1000 rows (read parquet) 7943 / 7950 0.0 79427.5 0.0X
--- End diff --
Ur, @wangyum . Is this 4 times slower than before?
cc @dbtsai .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22501
**[Test build #96369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96369/testReport)** for PR 22501 at commit [`f56b732`](https://github.com/apache/spark/commit/f56b73223fbf765e408d9aef6565a2318f4836e3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r223220176
--- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
- output = Some(new FileOutputStream(file))
+ output = Option(new FileOutputStream(file))
--- End diff --
Why do you replace `Some` to `Option`? Are you worrying `new FileOutputStream(file)` becomes `null`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3751/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22501#discussion_r226516354
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+parsing large select expressions
+================================================================================================
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 2 / 4 0.0 2050147.0 1.0X
-100 select expressions 6 / 7 0.0 6123412.0 0.3X
-2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+1 select expressions 2 / 4 0.0 1934953.0 1.0X
+100 select expressions 4 / 5 0.0 3659399.0 0.5X
+2500 select expressions 68 / 76 0.0 68278937.0 0.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+================================================================================================
+many column field read and write
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
-1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
-1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
-1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
-100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
-100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
-100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
-100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
-2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
-2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
-2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
-2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
+1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
+1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
+1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
--- End diff --
I have no idea how this happens. Can you create a JIRA ticket to investigate this regression?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22501
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4173/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22501
Thanks. It might rather more be related to external factors.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org