You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/11/07 15:17:57 UTC

[GitHub] spark pull request #22966: [PARK-25965][SQL] Add avro read benchmark

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/22966

    [PARK-25965][SQL] Add avro read benchmark

    ## What changes were proposed in this pull request?
    
    Add read benchmark for Avro, which is missing for a period.
    The benchmark is similar to `DataSourceReadBenchmark` and `OrcReadBenchmark`
    
    ## How was this patch tested?
    
    Manually run benchmark

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark avroReadBenchmark

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22966.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22966
    
----
commit 2713d6e83e5349ba8237a2c680665fb180d14e94
Author: Gengliang Wang <ge...@...>
Date:   2018-11-07T09:09:20Z

    add avro read benchmark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232794232
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    ? @gengliangwang . What I mean is we should not touch them in this PR `Add avro read benchmark`.
    > Should I change them as well? Which also makes sense to me.
    
    This PR should be updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5013/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98811/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    @dongjoon-hyun sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Cool, could you introduce it to Spark? That would be very helpful :)
    @dbtsai  @jleach4 and @aokolnychyi


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232461509
  
    --- Diff: external/avro/benchmarks/AvroReadBenchmark-results.txt ---
    @@ -0,0 +1,122 @@
    +================================================================================================
    +SQL Single Numeric Column Scan
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
    +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
    +SQL Single TINYINT Column Scan:          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +------------------------------------------------------------------------------------------------
    +Sum                                           2013 / 2071          7.8         128.0       1.0X
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
    +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
    +SQL Single SMALLINT Column Scan:         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +------------------------------------------------------------------------------------------------
    +Sum                                           1955 / 1957          8.0         124.3       1.0X
    --- End diff --
    
    Actually all these INT types are processed as INT. The difference is cause by JIT.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    **[Test build #98784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98784/testReport)** for PR 22966 at commit [`58efbe8`](https://github.com/apache/spark/commit/58efbe870d30acbb990195223f6a9f9177e40d02).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    cc @jleach4 and @aokolnychyi 
    
    We have a great success using [jmh](http://openjdk.java.net/projects/code-tools/jmh/) for this type of benchmarking; the benchmarks can be written in the unit test. This framework handles JVM warn-up, computes the latency, and throughput, etc, and then generates reports that can be consumed in Jenkins. We also use Jenkins to visualize the trend of performance changes which is very useful to find regressions. 
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Done, @dongjoon-hyun PTAL.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98784/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Oh, it's closed successfully finally.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    @dbtsai Great! 
    I was thinking the benchmark in this PR is kind of simple, so I didn't add it for over months..
    The benchmark you mentioned should also workable for other data sources, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    **[Test build #98554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98554/testReport)** for PR 22966 at commit [`2713d6e`](https://github.com/apache/spark/commit/2713d6e83e5349ba8237a2c680665fb180d14e94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    @dongjoon-hyun I think we can merge this one first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232272074
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    This is following `DataSourceReadBenchmark` and `OrcReadBenchmark`.
    Should I change them as well?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    jmh is a framework to write benchmark that can generate standardized reports to be consumed by Jenkins. 
    
    Here is an example, https://github.com/pvillega/jmh-scala-test/blob/master/src/main/scala/com/perevillega/JMHTest.scala


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    **[Test build #98811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98811/testReport)** for PR 22966 at commit [`e6b73f1`](https://github.com/apache/spark/commit/e6b73f120b784cc548505e70802b8ec821e4a04b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    **[Test build #98811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98811/testReport)** for PR 22966 at commit [`e6b73f1`](https://github.com/apache/spark/commit/e6b73f120b784cc548505e70802b8ec821e4a04b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232550388
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    @dongjoon-hyun OK, then I think this one is ready.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232456599
  
    --- Diff: external/avro/benchmarks/AvroReadBenchmark-results.txt ---
    @@ -0,0 +1,122 @@
    +================================================================================================
    +SQL Single Numeric Column Scan
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
    +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
    +SQL Single TINYINT Column Scan:          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +------------------------------------------------------------------------------------------------
    +Sum                                           2013 / 2071          7.8         128.0       1.0X
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6
    +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
    +SQL Single SMALLINT Column Scan:         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +------------------------------------------------------------------------------------------------
    +Sum                                           1955 / 1957          8.0         124.3       1.0X
    --- End diff --
    
    Just curious why `Rate` is higher than for `TINYINT` and for `INT`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232155430
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    @gengliangwang . Can we use `SqlBasedBenchmark` for consistency?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r233097082
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    I see. I have updated this one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232794390
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    +  val conf = new SparkConf()
    +  conf.set("spark.sql.avro.compression.codec", "snappy")
    --- End diff --
    
    Ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Thank you, @gengliangwang . Could you close this? The patch is merged, but https://github.com/apache/spark/pull/22966#discussion_r233585890 change causes this conflicts.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98554/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232372133
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Nope. Not in this PR. Since it's beyond the scope of this PR. Let's consider that later in another PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r233585890
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,216 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.benchmark.Benchmark
    +import org.apache.spark.sql.DataFrame
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    --- End diff --
    
    Ur, there is a missing `comma` here. But, let me try to fix during merging.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    I also think having a performance trend would be useful. I'll be glad to help with this effort.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4994/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4818/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22966: [PARK-25965][SQL] Add avro read benchmark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22966
  
    **[Test build #98554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98554/testReport)** for PR 22966 at commit [`2713d6e`](https://github.com/apache/spark/commit/2713d6e83e5349ba8237a2c680665fb180d14e94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22966


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22966#discussion_r232155608
  
    --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala ---
    @@ -0,0 +1,226 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.Random
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Benchmark to measure Avro read performance.
    + * {{{
    + *   To run this benchmark:
    + *   1. without sbt: bin/spark-submit --class <this class>
    + *        --jars <catalyst test jar>,<core test jar><spark-avro jar> <avro test jar>
    + *   2. build/sbt "avro/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AvroReadBenchmark-results.txt".
    + * }}}
    + */
    +object AvroReadBenchmark extends BenchmarkBase with SQLHelper {
    +  val conf = new SparkConf()
    +  conf.set("spark.sql.avro.compression.codec", "snappy")
    --- End diff --
    
    Since this is the default value, I think we can remove line 41 ~ 49.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org