You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by KyleLi1985 <gi...@git.apache.org> on 2018/12/10 08:10:37 UTC

[GitHub] spark pull request #23271: [SPARK-26318][SQL] Enhance function merge perform...

GitHub user KyleLi1985 opened a pull request:

    https://github.com/apache/spark/pull/23271

    [SPARK-26318][SQL] Enhance function merge performance in Row

    ## What changes were proposed in this pull request?
    Enhance function merge performance in Row
    
    Like do 100000000 time Row.merge for input 
          val row1 = Row("name", "work", 2314, "null", 1, "")
          val row2 = Row(1, true, "name", null, "2010-10-22", 34, "location", "situation")
          val row3 = Row.fromSeq(Seq(row1,row2))
          val rows = Seq(row1,row2,row3)
          Row.merge(row1)
          Row.merge(rows:_*)
    it need 108458 millisecond and 158356 millisecond
    
    After add this commit, it only need 24967 millisecond and 34035 millisecond
    
    ## How was this patch tested?
    Unit test
    Accuracy test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KyleLi1985/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23271
    
----
commit 93c4af42d556b3779f6d56ffdf606c1132f8ef47
Author: 李亮 <li...@...>
Date:   2018-12-10T08:05:40Z

    Enhance function merge performance in Row

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5916/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99901/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    **[Test build #99901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99901/testReport)** for PR 23271 at commit [`93c4af4`](https://github.com/apache/spark/commit/93c4af42d556b3779f6d56ffdf606c1132f8ef47).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23271: [SPARK-26318][SQL] Enhance function merge perform...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23271#discussion_r240115920
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -58,8 +58,21 @@ object Row {
        * Merge multiple rows into a single row, one after another.
        */
       def merge(rows: Row*): Row = {
    -    // TODO: Improve the performance of this if used in performance critical part.
    -    new GenericRow(rows.flatMap(_.toSeq).toArray)
    +    val size = rows.size
    +    var number = 0
    +    for (i <- 0 until size) {
    +      number = number + rows(i).size
    +    }
    +    val container = Array.ofDim[Any](number)
    +    var n = 0
    +    for (i <- 0 until size) {
    +      val subSize = rows(i).size
    +      for (j <- 0 until subSize) {
    +        container(n) = rows(i)(j)
    +        n = n + 1
    +      }
    +    }
    +    new GenericRow(container)
    --- End diff --
    
    @KyleLi1985 . Do you have a real use case for this improvement?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    **[Test build #99901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99901/testReport)** for PR 23271 at commit [`93c4af4`](https://github.com/apache/spark/commit/93c4af42d556b3779f6d56ffdf606c1132f8ef47).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23271
  
    cc @rxin and @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org