You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2022/06/22 23:39:19 UTC

[spark] branch master updated: [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a4a83a31ed3 [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance
a4a83a31ed3 is described below

commit a4a83a31ed355c85097bce284eac05dbfd06d039
Author: yangjie01 <ya...@baidu.com>
AuthorDate: Wed Jun 22 18:39:07 2022 -0500

    [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance
    
    ### What changes were proposed in this pull request?
    `ExpressionSet ++` method in the master branch a little slower than the branch-3.3 with Scala-2.13, so this pr override `concat` method for `ExpressionSet` in Scala 2.13.
    
    ### Why are the changes needed?
    Improve the performance
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    
    - Pass GA
    - Manual test 1:
    
    microbench as follows and run with Scala 2.13:
    
    ```scala
        val valuesPerIteration = 100000
        val benchmark = new Benchmark("Test ExpressionSet ++ ", valuesPerIteration, output = output)
        val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1))
        val initialSet = ExpressionSet(aUpper + 1 :: Rand(0) :: Nil)
        val setToAddWithSameDeterministicExpression = ExpressionSet(aUpper + 1 :: Rand(0) :: Nil)
    
        benchmark.addCase("Test ++") { _: Int =>
          for (_ <- 0L until valuesPerIteration) {
            initialSet ++ setToAddWithSameDeterministicExpression
          }
        }
    
        benchmark.run()
    ```
    
    **branch-3.3 result:**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 4.14.0_1-0-0-45
    Intel(R) Xeon(R) Gold 6XXXC CPU  2.60GHz
    Test ExpressionSet ++ :                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------------------------------
    Test ++                                              14             16           4          7.2         139.1       1.0X
    ```
    **master result before this pr:**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 4.14.0_1-0-0-45
    Intel(R) Xeon(R) Gold 6XXXC CPU  2.60GHz
    Test ExpressionSet ++ :                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------------------------------
    Test ++                                              16             19           5          6.1         163.9       1.0X
    ```
    
    **master result after this pr:**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 4.14.0_1-0-0-45
    Intel(R) Xeon(R) Gold 6XXXC CPU  2.60GHz
    Test ExpressionSet ++ :                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------------------------------
    Test ++                                              12             13           3          8.6         115.7       1.0X
    
    ```
    
    - Manual test 2:
    
    ```
    dev/change-scala-version.sh 2.13
    mvn clean install -pl sql/core -am -DskipTests -Pscala-2.13
    mvn test -pl sql/catalyst -Pscala-2.13
    mvn test -pl sql/core -Pscala-2.13
    ```
    
    ```
    Run completed in 10 minutes, 40 seconds.
    Total number of tests run: 6584
    Suites: completed 285, aborted 0
    Tests: succeeded 6584, failed 0, canceled 0, ignored 5, pending 0
    All tests passed.
    ```
    
    ```
    Run completed in 1 hour, 27 minutes, 16 seconds.
    Total number of tests run: 11745
    Suites: completed 520, aborted 0
    Tests: succeeded 11745, failed 0, canceled 7, ignored 57, pending 0
    All tests passed.
    ```
    
    Closes #36942 from LuciferYang/ExpressionSet.
    
    Authored-by: yangjie01 <ya...@baidu.com>
    Signed-off-by: Sean Owen <sr...@gmail.com>
---
 .../org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala   | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/sql/catalyst/src/main/scala-2.13/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala-2.13/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
index e38deedec6d..a615223ef79 100644
--- a/sql/catalyst/src/main/scala-2.13/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
+++ b/sql/catalyst/src/main/scala-2.13/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
@@ -132,6 +132,12 @@ class ExpressionSet protected(
     newSet
   }
 
+  override def concat(that: IterableOnce[Expression]): ExpressionSet = {
+    val newSet = clone()
+    that.iterator.foreach(newSet.add)
+    newSet
+  }
+
   override def --(that: IterableOnce[Expression]): ExpressionSet = {
     val newSet = clone()
     that.iterator.foreach(newSet.remove)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org