You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/09/20 16:06:54 UTC

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22501

    [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use main method

    ## What changes were proposed in this pull request?
    
    Refactor `WideSchemaBenchmark` to use main method.
    Generate benchmark result:
    ```sh
    SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.WideSchemaBenchmark"
    ```
    
    ## How was this patch tested?
    
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25492

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22501
    
----
commit f56b73223fbf765e408d9aef6565a2318f4836e3
Author: Yuming Wang <yu...@...>
Date:   2018-09-20T16:04:30Z

    Refactor WideSchemaBenchmark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97534/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    @cloud-fan After updating on EC2, almost ratio and values looks more stable and reasonable for now. The following two are noticeable changes, but it looks like Parquet Writer improvement (instead of regression).
    
    **1. Read/Write ratio is reverted (`0.8` -> `1.7`)**
    I'm not sure but Parquet writer for `deep
    ```scala
    - 128 x 8 deep x 1000 rows (read parquet)         69 /   74          1.4         693.9       0.2X
    - 128 x 8 deep x 1000 rows (write parquet)        78 /   83          1.3         777.7       0.2X
    + 128 x 8 deep x 1000 rows (read parquet)        351 /  379          0.3        3510.3       0.1X
    + 128 x 8 deep x 1000 rows (write parquet)       199 /  203          0.5        1988.3       0.2X
    ```
    
    **2. Read/Write ratio is changed noticeably (`4.6` -> `8.3`)**
    ```scala
    - 1024 x 11 deep x 100 rows (read parquet)        426 /  433          0.2        4263.7       0.0X
    - 1024 x 11 deep x 100 rows (write parquet)        91 /   98          1.1         913.5       0.1X
    + 1024 x 11 deep x 100 rows (read parquet)       2063 / 2078          0.0       20629.2       0.0X
    + 1024 x 11 deep x 100 rows (write parquet)       248 /  266          0.4        2475.1       0.1X
    ```
    
    Since this is the first attempt to track this and the previous result is too old, there exists some obvious limitation during comparison. From Spark 2.4.0, we can get a consistent compasison instead of `different` personal mac.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    thank you guys for refreshing the benchmarks and results! It's very helpful.
    
    If possible, can we post the perf regressions we found in the umbrella JIRA? Then people can see if the perf regression is reasonable(if we have addressed it) or investigate how the regression was introduced.
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r223195740
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    Change here because: https://github.com/apache/spark/pull/22443#discussion_r221181428


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97642/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97056/testReport)** for PR 22501 at commit [`e6f39f3`](https://github.com/apache/spark/commit/e6f39f36b5d806f1afcea980ba43d544dadbe35f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97665/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r219725654
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala ---
    @@ -17,22 +17,19 @@
     
     package org.apache.spark.sql
     
    -import java.io.{File, FileOutputStream, OutputStream}
    +import java.io.File
     
    -import org.scalatest.BeforeAndAfterEach
    -
    -import org.apache.spark.SparkFunSuite
    -import org.apache.spark.sql.functions._
    -import org.apache.spark.util.{Benchmark, Utils}
    +import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils}
     
     /**
      * Benchmark for performance with very wide and nested DataFrames.
    - * To run this:
    - *  build/sbt "sql/test-only *WideSchemaBenchmark"
    - *
    - * Results will be written to "sql/core/benchmarks/WideSchemaBenchmark-results.txt".
    + * To run this benchmark:
    + * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + * 2. build/sbt "sql/test:runMain <this class>"
    + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *    Results will be written to "benchmarks/WideSchemaBenchmark-results.txt".
    --- End diff --
    
    Thanks @dongjoon-hyun. Actually I'm waiting for https://github.com/apache/spark/pull/22484. I want to move  `withTempDir()` to  `RunBenchmarkWithCodegen.scala`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97642/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3306/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4150/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r223195081
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    This looks like irrelevant pig-back.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    I guess it's related with pip packaging tho. 
    
    ```
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
            'Programming Language :: Python :: Implementation :: PyPy']
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
            return distutils.core.setup(**attrs)
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/core.py", line 148, in setup
            dist.run_commands()
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
            self.run_command(cmd)
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
            cmd_obj.run()
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
            self.install_for_development()
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
            self.process_distribution(None, self.dist, not self.no_deps)
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
            self.install_egg_scripts(dist)
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
            script_text = strm.read()
          File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
            return codecs.ascii_decode(input, self.errors)[0]
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
        
    ```
    
    It's from setup.py


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97056/testReport)** for PR 22501 at commit [`e6f39f3`](https://github.com/apache/spark/commit/e6f39f36b5d806f1afcea980ba43d544dadbe35f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97627/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97644/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97534/testReport)** for PR 22501 at commit [`82e2367`](https://github.com/apache/spark/commit/82e2367a203ffc03dea9bf826a5085059e1391ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r223196145
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    IIUC, @HyukjinKwon meant `when you need to touch this file`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    seems jenkins is broken, cc @shaneknapp 
    ```
    Command "/tmp/tmp.JfFHaoRFPU/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps" failed with error code 1 in /home/jenkins/workspace/SparkPullRequestBuilder/python/
    You are using pip version 10.0.1, however version 18.1 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command.
    Cleaning up temporary directory - /tmp/tmp.JfFHaoRFPU
    [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ; received return code 1
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Hi, @wangyum . I ran the test on EC2 `r3.xlarge`, too. It looks more stable than this.
    Could you review and merge https://github.com/wangyum/spark/pull/19 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22501


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by shaneknapp <gi...@git.apache.org>.

Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    @cloud-fan  --  pip isn't broken...  the actual error is found right above what you cut and pasted:
    
    `    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)`
    
    i won't be able to look any deeper in to this until at least tomorrow at the earliest.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Thanks, I found `0xc2` in `docker-image-tool.sh`. I will put my finding into #22782


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4153/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226769745
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    For this part, right, @rdblue . I guess so.
    After merging EC2 result to @wangyum 's PR, I'll compare the numbers one by one once again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Is [this](https://github.com/apache/spark/pull/22748#issuecomment-431512558) the oldest test failure related to this type of failure?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96369/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    I am looking at each commit from the latest to old at https://github.com/apache/spark/commits/master 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97644/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97627/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97627/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r223202914
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    I am worried that I will forget it after a long time, so I am changing this time. I should revert it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #96369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96369/testReport)** for PR 22501 at commit [`f56b732`](https://github.com/apache/spark/commit/f56b73223fbf765e408d9aef6565a2318f4836e3).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r219724989
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala ---
    @@ -17,22 +17,19 @@
     
     package org.apache.spark.sql
     
    -import java.io.{File, FileOutputStream, OutputStream}
    +import java.io.File
     
    -import org.scalatest.BeforeAndAfterEach
    -
    -import org.apache.spark.SparkFunSuite
    -import org.apache.spark.sql.functions._
    -import org.apache.spark.util.{Benchmark, Utils}
    +import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils}
     
     /**
      * Benchmark for performance with very wide and nested DataFrames.
    - * To run this:
    - *  build/sbt "sql/test-only *WideSchemaBenchmark"
    - *
    - * Results will be written to "sql/core/benchmarks/WideSchemaBenchmark-results.txt".
    + * To run this benchmark:
    + * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + * 2. build/sbt "sql/test:runMain <this class>"
    + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *    Results will be written to "benchmarks/WideSchemaBenchmark-results.txt".
    --- End diff --
    
    Could you fix doc generation failure?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226742168
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    The following [EC2 result](https://github.com/wangyum/spark/pull/19) shows the consistent ratio like Spark 2.1.0. The result on Mac seemed to be unstable for some unknown reason like https://github.com/apache/spark/pull/22501#discussion_r226440992. 
    ```scala
    1 cols x 100000 rows (read parquet)             61 /   70          1.6         610.2       0.6X
    1 cols x 100000 rows (write parquet)           209 /  233          0.5        2086.1       0.2X
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97644/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4137/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226520120
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    May be a parquet issue. I found that the binary write performance is a little worse after upgrading to parquet 1.10.0: https://github.com/apache/parquet-mr/pull/505. I will verify it later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97665/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4089/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97534/testReport)** for PR 22501 at commit [`82e2367`](https://github.com/apache/spark/commit/82e2367a203ffc03dea9bf826a5085059e1391ed).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97665/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226740901
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    +100 cols x 1000 rows (read in-mem)              26 /   28          3.9         255.8       0.9X
    +100 cols x 1000 rows (exec in-mem)              32 /   35          3.1         319.3       0.7X
    +100 cols x 1000 rows (read parquet)             45 /   52          2.2         445.9       0.5X
    +100 cols x 1000 rows (write parquet)           275 /  536          0.4        2746.1       0.1X
    +2500 cols x 40 rows (read in-mem)              261 /  434          0.4        2607.3       0.1X
    +2500 cols x 40 rows (exec in-mem)              624 /  701          0.2        6240.5       0.0X
    +2500 cols x 40 rows (read parquet)             196 /  301          0.5        1963.4       0.1X
    +2500 cols x 40 rows (write parquet)            687 / 1049          0.1        6870.6       0.0X
    --- End diff --
    
    FYI, this large gap was removed at EC2 result.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226439834
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    This might be a little regression on Parquet writer from Spark 2.1.0 (SPARK-17335).
    
    cc @cloud-fan and @gatorsmile , @rdblue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226440992
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    +100 cols x 1000 rows (read in-mem)              26 /   28          3.9         255.8       0.9X
    +100 cols x 1000 rows (exec in-mem)              32 /   35          3.1         319.3       0.7X
    +100 cols x 1000 rows (read parquet)             45 /   52          2.2         445.9       0.5X
    +100 cols x 1000 rows (write parquet)           275 /  536          0.4        2746.1       0.1X
    +2500 cols x 40 rows (read in-mem)              261 /  434          0.4        2607.3       0.1X
    +2500 cols x 40 rows (exec in-mem)              624 /  701          0.2        6240.5       0.0X
    +2500 cols x 40 rows (read parquet)             196 /  301          0.5        1963.4       0.1X
    +2500 cols x 40 rows (write parquet)            687 / 1049          0.1        6870.6       0.0X
    --- End diff --
    
    The difference between `best` and `average` is too high in line 32 and line 33.
    I'll try to run this on EC2, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #97642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97642/testReport)** for PR 22501 at commit [`64e5ede`](https://github.com/apache/spark/commit/64e5ede51fcc900d51256d421d86939b202f3d75).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Thank you, @wangyum and all!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97056/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226765772
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    @dongjoon-hyun, so you are saying that it doesn't appear that there is a performance regression, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Yup, I made a fix https://github.com/apache/spark/pull/22782


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r224985471
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    My point was that there's no point of checking `null` below from my cursory look. If there's no chance that it becomes `null`, we can leave it `Some` and remove `null` check below.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Thanks, when it was successful, this is a part of log from [this](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97378/consoleText)
    ```
    copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
    Writing pyspark-3.0.0.dev0/setup.cfg
    Creating tar archive
    removing 'pyspark-3.0.0.dev0' (and everything under it)
    Installing dist into virtual env
    Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
    Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
      Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
    mkl-random 1.0.1 requires cython, which is not installed.
    Installing collected packages: py4j, pyspark
      Running setup.py develop for pyspark
    Successfully installed py4j-0.10.7 pyspark
    You are using pip version 10.0.1, however version 18.1 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command.
    Run basic sanity check on pip installed version with spark-submit
    ```
    
    Now, we are seeing the following
    ```
    copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
    Writing pyspark-3.0.0.dev0/setup.cfg
    Creating tar archive
    removing 'pyspark-3.0.0.dev0' (and everything under it)
    Installing dist into virtual env
    Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
    Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
      Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
    mkl-random 1.0.1 requires cython, which is not installed.
    Installing collected packages: py4j, pyspark
      Running setup.py develop for pyspark
        Complete output from command /tmp/tmp.EWtmCOYUBn/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps:
        running develop
        running egg_info
        writing dependency_links to pyspark.egg-info/dependency_links.txt
        writing pyspark.egg-info/PKG-INFO
        writing requirements to pyspark.egg-info/requires.txt
        writing top-level names to pyspark.egg-info/top_level.txt
        Could not import pypandoc - required to package PySpark
        package init file 'deps/bin/__init__.py' not found (or not a regular file)
        package init file 'deps/jars/__init__.py' not found (or not a regular file)
        package init file 'pyspark/python/pyspark/__init__.py' not found (or not a regular file)
        package init file 'lib/__init__.py' not found (or not a regular file)
        package init file 'deps/data/__init__.py' not found (or not a regular file)
        package init file 'deps/licenses/__init__.py' not found (or not a regular file)
        package init file 'deps/examples/__init__.py' not found (or not a regular file)
        reading manifest file 'pyspark.egg-info/SOURCES.txt'
        reading manifest template 'MANIFEST.in'
        warning: no previously-included files matching '*.py[cod]' found anywhere in distribution
        warning: no previously-included files matching '__pycache__' found anywhere in distribution
        warning: no previously-included files matching '.DS_Store' found anywhere in distribution
        writing manifest file 'pyspark.egg-info/SOURCES.txt'
        running build_ext
        Creating /tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/pyspark.egg-link (link to .)
        Adding pyspark 3.0.0.dev0 to easy-install.pth file
        Installing load-spark-env.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-submit script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-class.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing beeline.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing find-spark-home.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing run-example script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-shell2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing pyspark script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing sparkR script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-sql script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-submit.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-shell script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing beeline script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-submit2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing find-spark-home script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing sparkR.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing run-example.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing sparkR2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-shell.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-sql.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Installing spark-class2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
            'Programming Language :: Python :: Implementation :: PyPy']
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
            return distutils.core.setup(**attrs)
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/core.py", line 148, in setup
            dist.run_commands()
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
            self.run_command(cmd)
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
            cmd_obj.run()
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
            self.install_for_development()
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
            self.process_distribution(None, self.dist, not self.no_deps)
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
            self.install_egg_scripts(dist)
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
            script_text = strm.read()
          File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
            return codecs.ascii_decode(input, self.errors)[0]
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
        
        ----------------------------------------
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226442573
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    +100 cols x 1000 rows (read in-mem)              26 /   28          3.9         255.8       0.9X
    +100 cols x 1000 rows (exec in-mem)              32 /   35          3.1         319.3       0.7X
    +100 cols x 1000 rows (read parquet)             45 /   52          2.2         445.9       0.5X
    +100 cols x 1000 rows (write parquet)           275 /  536          0.4        2746.1       0.1X
    +2500 cols x 40 rows (read in-mem)              261 /  434          0.4        2607.3       0.1X
    +2500 cols x 40 rows (exec in-mem)              624 /  701          0.2        6240.5       0.0X
    +2500 cols x 40 rows (read parquet)             196 /  301          0.5        1963.4       0.1X
    +2500 cols x 40 rows (write parquet)            687 / 1049          0.1        6870.6       0.0X
    +
    +
    +================================================================================================
    +wide shallowly nested struct field read and write
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     wide shallowly nested struct field r/w:  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 wide x 100000 rows (read in-mem)              15 /   17          6.6         151.0       1.0X
    -1 wide x 100000 rows (exec in-mem)              20 /   22          5.1         196.6       0.8X
    -1 wide x 100000 rows (read parquet)             59 /   63          1.7         592.8       0.3X
    -1 wide x 100000 rows (write parquet)            81 /   87          1.2         814.6       0.2X
    -100 wide x 1000 rows (read in-mem)              21 /   25          4.8         208.7       0.7X
    -100 wide x 1000 rows (exec in-mem)              72 /   81          1.4         718.5       0.2X
    -100 wide x 1000 rows (read parquet)             75 /   85          1.3         752.6       0.2X
    -100 wide x 1000 rows (write parquet)            88 /   95          1.1         876.7       0.2X
    -2500 wide x 40 rows (read in-mem)               28 /   34          3.5         282.2       0.5X
    -2500 wide x 40 rows (exec in-mem)             1269 / 1284          0.1       12688.1       0.0X
    -2500 wide x 40 rows (read parquet)             549 /  578          0.2        5493.4       0.0X
    -2500 wide x 40 rows (write parquet)             96 /  104          1.0         959.1       0.2X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 wide x 100000 rows (read in-mem)              23 /   42          4.4         226.2       1.0X
    +1 wide x 100000 rows (exec in-mem)              29 /   53          3.5         288.5       0.8X
    +1 wide x 100000 rows (read parquet)             93 /  102          1.1         928.2       0.2X
    +1 wide x 100000 rows (write parquet)           201 /  222          0.5        2009.6       0.1X
    +100 wide x 1000 rows (read in-mem)              42 /   55          2.4         421.8       0.5X
    +100 wide x 1000 rows (exec in-mem)              55 /  113          1.8         547.0       0.4X
    +100 wide x 1000 rows (read parquet)            139 /  263          0.7        1390.6       0.2X
    +100 wide x 1000 rows (write parquet)           245 /  338          0.4        2450.9       0.1X
    +2500 wide x 40 rows (read in-mem)               51 /   72          2.0         511.7       0.4X
    +2500 wide x 40 rows (exec in-mem)              265 /  303          0.4        2654.8       0.1X
    +2500 wide x 40 rows (read parquet)            1285 / 1339          0.1       12845.1       0.0X
    +2500 wide x 40 rows (write parquet)            238 /  262          0.4        2378.8       0.1X
     
    +
    +================================================================================================
    +deeply nested struct field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     deeply nested struct field r/w:          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 deep x 100000 rows (read in-mem)              14 /   16          7.0         143.8       1.0X
    -1 deep x 100000 rows (exec in-mem)              17 /   19          5.9         169.7       0.8X
    -1 deep x 100000 rows (read parquet)             33 /   35          3.1         327.0       0.4X
    -1 deep x 100000 rows (write parquet)            79 /   84          1.3         786.9       0.2X
    -100 deep x 1000 rows (read in-mem)              21 /   24          4.7         211.3       0.7X
    -100 deep x 1000 rows (exec in-mem)             221 /  235          0.5        2214.5       0.1X
    -100 deep x 1000 rows (read parquet)           1928 / 1952          0.1       19277.1       0.0X
    -100 deep x 1000 rows (write parquet)            91 /   96          1.1         909.5       0.2X
    -250 deep x 400 rows (read in-mem)               57 /   61          1.8         567.1       0.3X
    -250 deep x 400 rows (exec in-mem)             1329 / 1385          0.1       13291.8       0.0X
    -250 deep x 400 rows (read parquet)          36563 / 36750          0.0      365630.2       0.0X
    -250 deep x 400 rows (write parquet)            126 /  130          0.8        1262.0       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 deep x 100000 rows (read in-mem)              20 /   24          5.1         197.9       1.0X
    +1 deep x 100000 rows (exec in-mem)              23 /   28          4.4         227.8       0.9X
    +1 deep x 100000 rows (read parquet)             50 /   58          2.0         500.1       0.4X
    +1 deep x 100000 rows (write parquet)           195 /  219          0.5        1945.1       0.1X
    +100 deep x 1000 rows (read in-mem)              39 /   57          2.5         393.1       0.5X
    +100 deep x 1000 rows (exec in-mem)             480 /  556          0.2        4795.7       0.0X
    +100 deep x 1000 rows (read parquet)           7943 / 7950          0.0       79427.5       0.0X
    --- End diff --
    
    Ur, @wangyum . Is this 4 times slower than before?
    
    cc @dbtsai .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    **[Test build #96369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96369/testReport)** for PR 22501 at commit [`f56b732`](https://github.com/apache/spark/commit/f56b73223fbf765e408d9aef6565a2318f4836e3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r223220176
  
    --- Diff: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
    @@ -48,15 +48,11 @@ abstract class BenchmarkBase {
           if (!file.exists()) {
             file.createNewFile()
           }
    -      output = Some(new FileOutputStream(file))
    +      output = Option(new FileOutputStream(file))
    --- End diff --
    
    Why do you replace `Some` to `Option`? Are you worrying `new FileOutputStream(file)` becomes `null`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3751/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22501#discussion_r226516354
  
    --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
    @@ -1,117 +1,145 @@
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +================================================================================================
    +parsing large select expressions
    +================================================================================================
     
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 select expressions                             2 /    4          0.0     2050147.0       1.0X
    -100 select expressions                           6 /    7          0.0     6123412.0       0.3X
    -2500 select expressions                        135 /  141          0.0   134623148.0       0.0X
    +1 select expressions                             2 /    4          0.0     1934953.0       1.0X
    +100 select expressions                           4 /    5          0.0     3659399.0       0.5X
    +2500 select expressions                         68 /   76          0.0    68278937.0       0.0X
     
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
     
    +================================================================================================
    +many column field read and write
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     many column field r/w:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
     ------------------------------------------------------------------------------------------------
    -1 cols x 100000 rows (read in-mem)              16 /   18          6.3         158.6       1.0X
    -1 cols x 100000 rows (exec in-mem)              17 /   19          6.0         166.7       1.0X
    -1 cols x 100000 rows (read parquet)             24 /   26          4.3         235.1       0.7X
    -1 cols x 100000 rows (write parquet)            81 /   85          1.2         811.3       0.2X
    -100 cols x 1000 rows (read in-mem)              17 /   19          6.0         166.2       1.0X
    -100 cols x 1000 rows (exec in-mem)              25 /   27          4.0         249.2       0.6X
    -100 cols x 1000 rows (read parquet)             23 /   25          4.4         226.0       0.7X
    -100 cols x 1000 rows (write parquet)            83 /   87          1.2         831.0       0.2X
    -2500 cols x 40 rows (read in-mem)              132 /  137          0.8        1322.9       0.1X
    -2500 cols x 40 rows (exec in-mem)              326 /  330          0.3        3260.6       0.0X
    -2500 cols x 40 rows (read parquet)             831 /  839          0.1        8305.8       0.0X
    -2500 cols x 40 rows (write parquet)            237 /  245          0.4        2372.6       0.1X
    -
    -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    -Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    +1 cols x 100000 rows (read in-mem)              22 /   25          4.6         219.4       1.0X
    +1 cols x 100000 rows (exec in-mem)              22 /   28          4.5         223.8       1.0X
    +1 cols x 100000 rows (read parquet)             45 /   49          2.2         449.6       0.5X
    +1 cols x 100000 rows (write parquet)           204 /  223          0.5        2044.4       0.1X
    --- End diff --
    
    I have no idea how this happens. Can you create a JIRA ticket to investigate this regression?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4173/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    Thanks. It might rather more be related to external factors.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org