You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/14 21:54:22 UTC

[GitHub] [spark] viirya opened a new pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

viirya opened a new pull request #30379:
URL: https://github.com/apache/spark/pull/30379


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This patch adds a benchmark `SubExprEliminationBenchmark` for benchmarking subexpression elimination feature.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   We need a benchmark for subexpression elimination feature for change such as #30341.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No, dev only.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293087






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293839


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131099/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294445


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727285058


   > @maropu That's a good point. Initially, I thought three steps. 1) we add a benchmark with the baseline first, 2) merge the #30341, 3) update this benchmark in another PR. But, your suggestion also looks good because this PR already have (3).
   > 
   > > This will be merged after #30341 finished?
   
   I think we should merge this first and in #30341 we can update this benchmark?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727281979


   @dongjoon-hyun Yea, but, on second thought, merging this fist looks fine, too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727271068


   Thank you so much for this additional work, @viirya !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727329518


   Since the last commit is only for commenting and this PR already passed, I merged this. Thanks~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727481924


   **[Test build #131100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131100/testReport)** for PR 30379 at commit [`4f73ec4`](https://github.com/apache/spark/commit/4f73ec4759e649dfb48b0c69b479c0c680adb487).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523473622



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+

Review comment:
       nit: The other benchmark codes don't seem to have this blank between the comment and the object.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
##########
@@ -66,4 +68,26 @@ trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
       ds.write.format("noop").mode(Overwrite).save()
     }
   }
+
+  protected def prepareDataInfo(benchmark: Benchmark): Unit = {
+    // scalastyle:off println
+    benchmark.out.println("Preparing data for benchmarking ...")
+    // scalastyle:on println
+  }
+
+  /**
+   * Prepares a table with wide row for benchmarking. The table will be written into
+   * the given path.
+   */
+  protected  def writeWideRow(path: String, rowsNum: Int, numCols: Int): StructType = {
+    val fields = Seq.tabulate(numCols)(i => StructField(s"col$i", IntegerType))
+    val schema = StructType(fields)
+
+    spark.range(rowsNum)
+      .select(Seq.tabulate(numCols)(i => lit(i).as(s"col$i")): _*)
+      .write.json(path)
+
+    schema
+  }
+

Review comment:
       nit: remove this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294448






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727284961






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523468297



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-results.txt
##########
@@ -0,0 +1,15 @@
+================================================================================================
+Benchmark for performance of subexpression elimination
+================================================================================================
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6

Review comment:
       Could you run the benchmark with Java11 once more?

##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-results.txt
##########
@@ -0,0 +1,15 @@
+================================================================================================
+Benchmark for performance of subexpression elimination
+================================================================================================
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6

Review comment:
       Could you run the benchmark with Java11 once more to have both result files?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727282415


   Your idea is better and correct because there is no conf yet~ 😄 
   > Yea, but, on second thought, merging this fist looks fine, too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727271125


   **[Test build #131099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131099/testReport)** for PR 30379 at commit [`90ded6b`](https://github.com/apache/spark/commit/90ded6b35a6db1232f389073789083834a335574).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+    val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = output)
+
+    withTempPath { path =>
+      prepareDataInfo(benchmark)
+      val numCols = 1000
+      val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+      val cols = (0 until numCols).map { idx =>
+        from_json('value, schema).getField(s"col$idx")
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen on", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen off", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      // We only benchmark subexpression performance under codegen/non-codegen, so disabling
+      // json optimization.

Review comment:
       Oh, this seems to be moved together to line 52.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515493






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727270996


   cc @dongjoon-hyun @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+    val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = output)
+
+    withTempPath { path =>
+      prepareDataInfo(benchmark)
+      val numCols = 1000
+      val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+      val cols = (0 until numCols).map { idx =>
+        from_json('value, schema).getField(s"col$idx")
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen on", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen off", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      // We only benchmark subexpression performance under codegen/non-codegen, so disabling
+      // json optimization.

Review comment:
       Oh, it seems that we need to move this comment block to line 52.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523487480



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-results.txt
##########
@@ -7,9 +7,9 @@ OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 from_json as subExpr:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 -------------------------------------------------------------------------------------------------------------------------
-subexpressionElimination on, codegen on             2303           2543         238          0.0    23029833.1       1.0X
-subexpressionElimination on, codegen off           23107          23520         427          0.0   231069443.3       0.1X
-subexpressionElimination off, codegen on           23363          23848         421          0.0   233634044.9       0.1X
-subexpressionElimination off, codegen off          22997          23355         438          0.0   229974135.0       0.1X
+subexpressionElimination off, codegen on           24841          25365         803          0.0   248412787.5       1.0X
+subexpressionElimination off, codegen off          25344          26205         941          0.0   253442656.5       1.0X
+subexpressionElimination on, codegen on             2883           3019         119          0.0    28833086.8       8.6X

Review comment:
       Nice. It's clearly `8.6x`. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727275058


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523477798



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-results.txt
##########
@@ -0,0 +1,15 @@
+================================================================================================
+Benchmark for performance of subexpression elimination
+================================================================================================
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6
+Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
+from_json as subExpr:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+-------------------------------------------------------------------------------------------------------------------------
+subexpressionElimination on, codegen on             2303           2543         238          0.0    23029833.1       1.0X
+subexpressionElimination on, codegen off           23107          23520         427          0.0   231069443.3       0.1X
+subexpressionElimination off, codegen on           23363          23848         421          0.0   233634044.9       0.1X
+subexpressionElimination off, codegen off          22997          23355         438          0.0   229974135.0       0.1X

Review comment:
       If we are going to merge this first, we need to merge a subset of this PR. Only the last two.
   ```
   subexpressionElimination off, codegen on           23363          23848         421          0.0   233634044.9       0.1X
   subexpressionElimination off, codegen off          22997          23355         438          0.0   229974135.0       0.1X
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515493






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727291002


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35704/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727484748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727277904


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293715


   **[Test build #131099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131099/testReport)** for PR 30379 at commit [`90ded6b`](https://github.com/apache/spark/commit/90ded6b35a6db1232f389073789083834a335574).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515354


   **[Test build #131102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)** for PR 30379 at commit [`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727292169


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727277910






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #30379:
URL: https://github.com/apache/spark/pull/30379


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727287912


   **[Test build #131102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)** for PR 30379 at commit [`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727288272


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35703/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727271125


   **[Test build #131099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131099/testReport)** for PR 30379 at commit [`90ded6b`](https://github.com/apache/spark/commit/90ded6b35a6db1232f389073789083834a335574).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523468628



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+    val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = output)
+
+    withTempPath { path =>
+      prepareDataInfo(benchmark)
+      val numCols = 1000
+      val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+      val cols = (0 until numCols).map { idx =>
+        from_json('value, schema).getField(s"col$idx")
+      }
+
+      // We only benchmark subexpression performance under codegen/non-codegen, so disabling
+      // json optimization.
+      benchmark.addCase("subexpressionElimination on, codegen on", numIters) { _ =>
+        withSQLConf(
+            SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "true",
+            SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+            SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+            SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination on, codegen off", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "true",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen on", numIters) { _ =>

Review comment:
       Could you move 82 ~ 106 to before line 54?
   Then, this will be the base line and we can easily say that your improvement is `xx times`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+    val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = output)
+
+    withTempPath { path =>
+      prepareDataInfo(benchmark)
+      val numCols = 1000
+      val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+      val cols = (0 until numCols).map { idx =>
+        from_json('value, schema).getField(s"col$idx")
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen on", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen off", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      // We only benchmark subexpression performance under codegen/non-codegen, so disabling
+      // json optimization.

Review comment:
       Oh, it seems that we need to move this comment block to line 52 because it's a global comment for all four run.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727280631


   @maropu That's a good point. Initially, I thought three steps. 1) we add a benchmark with the baseline first, 2) merge the #30341, 3) update this benchmark in another PR. But, your suggestion also looks good because this PR already have (3).
   > This will be merged after #30341 finished?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727285058


   > @maropu That's a good point. Initially, I thought three steps. 1) we add a benchmark with the baseline first, 2) merge the #30341, 3) update this benchmark in another PR. But, your suggestion also looks good because this PR already have (3).
   > 
   > > This will be merged after #30341 finished?
   
   I think we should merge this first and then in #30341 we can update this benchmark?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727277910






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523517002



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+

Review comment:
       Actually this was copied from `JsonBenchmark`. So I remove this blank line for `JsonBenchmark` and `SubExprEliminationBenchmark` now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293836






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727514912


   **[Test build #131101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131101/testReport)** for PR 30379 at commit [`8295ba2`](https://github.com/apache/spark/commit/8295ba2bee50d6aaff108194b1afb13fbfddafad).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523670660



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt
##########
@@ -0,0 +1,15 @@
+================================================================================================
+Benchmark for performance of subexpression elimination
+================================================================================================
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 11.0.9+11 on Mac OS X 10.15.6
+Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
+from_json as subExpr:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+-------------------------------------------------------------------------------------------------------------------------
+subexpressionElimination off, codegen on           26809          27731         898          0.0   268094225.4       1.0X
+subexpressionElimination off, codegen off          25117          26612        1357          0.0   251166638.4       1.1X
+subexpressionElimination on, codegen on             2582           2906         282          0.0    25819408.7      10.4X

Review comment:
       Wow. It's faster in Java11?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293087






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523490332



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-results.txt
##########
@@ -0,0 +1,15 @@
+================================================================================================
+Benchmark for performance of subexpression elimination
+================================================================================================
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6
+Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
+from_json as subExpr:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+-------------------------------------------------------------------------------------------------------------------------
+subexpressionElimination on, codegen on             2303           2543         238          0.0    23029833.1       1.0X
+subexpressionElimination on, codegen off           23107          23520         427          0.0   231069443.3       0.1X
+subexpressionElimination off, codegen on           23363          23848         421          0.0   233634044.9       0.1X
+subexpressionElimination off, codegen off          22997          23355         438          0.0   229974135.0       0.1X

Review comment:
       Why? We don't need `subexpressionElimination on` cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515074






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523676597



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar>,
+ *        <spark catalyst test jar> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+    val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = output)
+
+    withTempPath { path =>
+      prepareDataInfo(benchmark)
+      val numCols = 1000
+      val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+      val cols = (0 until numCols).map { idx =>
+        from_json('value, schema).getField(s"col$idx")
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen on", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      benchmark.addCase("subexpressionElimination off, codegen off", numIters) { _ =>
+        withSQLConf(
+          SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+          SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+          SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+          val df = spark.read
+            .text(path.getAbsolutePath)
+            .select(cols: _*)
+          df.collect()
+        }
+      }
+
+      // We only benchmark subexpression performance under codegen/non-codegen, so disabling
+      // json optimization.

Review comment:
       Ah right, forgot it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294448






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290920






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293084


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35704/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290918


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35703/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727484748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727287205


   **[Test build #131101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131101/testReport)** for PR 30379 at commit [`8295ba2`](https://github.com/apache/spark/commit/8295ba2bee50d6aaff108194b1afb13fbfddafad).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727284961


   **[Test build #131100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131100/testReport)** for PR 30379 at commit [`4f73ec4`](https://github.com/apache/spark/commit/4f73ec4759e649dfb48b0c69b479c0c680adb487).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515074






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290920






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727287912


   **[Test build #131102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)** for PR 30379 at commit [`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org