You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by KyleLi1985 <gi...@git.apache.org> on 2018/12/10 08:10:37 UTC
[GitHub] spark pull request #23271: [SPARK-26318][SQL] Enhance function merge perform...
GitHub user KyleLi1985 opened a pull request:
https://github.com/apache/spark/pull/23271
[SPARK-26318][SQL] Enhance function merge performance in Row
## What changes were proposed in this pull request?
Enhance function merge performance in Row
Like do 100000000 time Row.merge for input
val row1 = Row("name", "work", 2314, "null", 1, "")
val row2 = Row(1, true, "name", null, "2010-10-22", 34, "location", "situation")
val row3 = Row.fromSeq(Seq(row1,row2))
val rows = Seq(row1,row2,row3)
Row.merge(row1)
Row.merge(rows:_*)
it need 108458 millisecond and 158356 millisecond
After add this commit, it only need 24967 millisecond and 34035 millisecond
## How was this patch tested?
Unit test
Accuracy test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/KyleLi1985/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23271.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23271
----
commit 93c4af42d556b3779f6d56ffdf606c1132f8ef47
Author: 李亮 <li...@...>
Date: 2018-12-10T08:05:40Z
Enhance function merge performance in Row
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5916/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/23271
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99901/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23271
**[Test build #99901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99901/testReport)** for PR 23271 at commit [`93c4af4`](https://github.com/apache/spark/commit/93c4af42d556b3779f6d56ffdf606c1132f8ef47).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23271: [SPARK-26318][SQL] Enhance function merge perform...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23271#discussion_r240115920
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -58,8 +58,21 @@ object Row {
* Merge multiple rows into a single row, one after another.
*/
def merge(rows: Row*): Row = {
- // TODO: Improve the performance of this if used in performance critical part.
- new GenericRow(rows.flatMap(_.toSeq).toArray)
+ val size = rows.size
+ var number = 0
+ for (i <- 0 until size) {
+ number = number + rows(i).size
+ }
+ val container = Array.ofDim[Any](number)
+ var n = 0
+ for (i <- 0 until size) {
+ val subSize = rows(i).size
+ for (j <- 0 until subSize) {
+ container(n) = rows(i)(j)
+ n = n + 1
+ }
+ }
+ new GenericRow(container)
--- End diff --
@KyleLi1985 . Do you have a real use case for this improvement?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23271
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23271
**[Test build #99901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99901/testReport)** for PR 23271 at commit [`93c4af4`](https://github.com/apache/spark/commit/93c4af42d556b3779f6d56ffdf606c1132f8ef47).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23271: [SPARK-26318][SQL] Enhance function merge performance in...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/23271
cc @rxin and @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org