You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yucai <gi...@git.apache.org> on 2018/08/10 05:26:07 UTC
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
GitHub user yucai opened a pull request:
https://github.com/apache/spark/pull/22066
[SPARK-25084][SQL] "distribute by" on multiple columns may lead to codegen issue
## What changes were proposed in this pull request?
"distribute by" on multiple columns may lead to codegen issue
## How was this patch tested?
UTs.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yucai/spark SPARK-25084
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22066.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22066
----
commit 8ee56bbfaacdd64b1712d72650a39939ca3b13f2
Author: yucai <yy...@...>
Date: 2018-08-10T05:19:43Z
"distribute by" on multiple columns may lead to codegen issue
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94543/testReport)** for PR 22066 at commit [`8ee56bb`](https://github.com/apache/spark/commit/8ee56bbfaacdd64b1712d72650a39939ca3b13f2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@LantaoJin I realized the initial way had some issue, so I marked it as WIP to refine and add test. It is different from your original implementation, so I would like to use this one.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22066
can you add a test first?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan @jerryshao sure, I will do it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan @gatorsmile PR has been ready, kindly help review.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan Jira and 1st is from this one. It is critical to our 2.3 migration.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94557/testReport)** for PR 22066 at commit [`931fa28`](https://github.com/apache/spark/commit/931fa28861f15ef1c31a51787f3bd59f2284de89).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by LantaoJin <gi...@git.apache.org>.
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
Thank you @yucai . New PR #22077 for branch-2.3. Cc: @cloud-fan @jerryshao
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209426886
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -404,21 +404,26 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
+ val tmpInput = ctx.freshName("input")
val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
- nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
+ nullSafeElementHash(tmpInput, index.toString, field.nullable, field.dataType, result, ctx)
}
val hashResultType = CodeGenerator.javaType(dataType)
- ctx.splitExpressions(
+ val code = ctx.splitExpressions(
expressions = fieldsHash,
funcName = "computeHashForStruct",
- arguments = Seq("InternalRow" -> input, hashResultType -> result),
+ arguments = Seq("InternalRow" -> tmpInput, hashResultType -> result),
returnType = hashResultType,
makeSplitFunction = body =>
s"""
|$body
|return $result;
""".stripMargin,
foldFunctions = _.map(funcCall => s"$result = $funcCall;").mkString("\n"))
+ s"""
+ |final InternalRow $tmpInput = $input;
--- End diff --
Yes, very agree, we can improve this in the future.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by LantaoJin <gi...@git.apache.org>.
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
Since you refactor your code copying from #22067 . Would you mind just use that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94564/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94564/testReport)** for PR 22066 at commit [`5da6d4d`](https://github.com/apache/spark/commit/5da6d4d437c6eb2bdf7a64c031b7c9281a5a8b83).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209426589
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -404,21 +404,26 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
+ val tmpInput = ctx.freshName("input")
val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
- nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
+ nullSafeElementHash(tmpInput, index.toString, field.nullable, field.dataType, result, ctx)
}
val hashResultType = CodeGenerator.javaType(dataType)
- ctx.splitExpressions(
+ val code = ctx.splitExpressions(
expressions = fieldsHash,
funcName = "computeHashForStruct",
- arguments = Seq("InternalRow" -> input, hashResultType -> result),
+ arguments = Seq("InternalRow" -> tmpInput, hashResultType -> result),
returnType = hashResultType,
makeSplitFunction = body =>
s"""
|$body
|return $result;
""".stripMargin,
foldFunctions = _.map(funcCall => s"$result = $funcCall;").mkString("\n"))
+ s"""
+ |final InternalRow $tmpInput = $input;
--- End diff --
Note: we can avoid creating a new variable if the `input` is already a variable. I think this can be done after we fully adopt the new codegen infra from @viirya
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209275505
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2831,4 +2831,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}
}
+
+ test("SPARK-25084: 'distribute by' on multiple columns may lead to codegen issue") {
+ withView("spark_25084") {
+ val count = 1000
--- End diff --
sorry, disregard the comment, thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94564/testReport)** for PR 22066 at commit [`5da6d4d`](https://github.com/apache/spark/commit/5da6d4d437c6eb2bdf7a64c031b7c9281a5a8b83).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94543/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94557/testReport)** for PR 22066 at commit [`931fa28`](https://github.com/apache/spark/commit/931fa28861f15ef1c31a51787f3bd59f2284de89).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by LantaoJin <gi...@git.apache.org>.
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
I offer other fix way. #22067
It doesn't need "input" as a global variable (If distribute by random)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209267827
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -778,21 +783,22 @@ case class HiveHash(children: Seq[Expression]) extends HashExpression[Int] {
input: String,
result: String,
fields: Array[StructField]): String = {
+ val tmpInput = ctx.freshName("input")
--- End diff --
Seems like `HiveHash` cannot be triggered in the normal way. Because Spark uses `Murmur3Hash`.
But this function does have this issue. You can hack to test in this way.
In `HashPartitioning`:
```
def partitionIdExpression: Expression = Pmod(new Murmur3Hash(expressions), Literal(numPartitions))
```
to
```
def partitionIdExpression: Expression = Pmod(new HiveHash(expressions), Literal(numPartitions))
```
Then run tests:
```
val df = spark.range(1000)
val columns = (0 until 400).map{ i => s"id as id$i" }
val distributeExprs = (0 until 100).map(c => s"id$c").mkString(",")
df.selectExpr(columns : _*).createTempView("test")
spark.sql(s"select * from test distribute by ($distributeExprs)").count()
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan , yeah, I will include it in 2.3.2.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22066
**[Test build #94543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94543/testReport)** for PR 22066 at commit [`8ee56bb`](https://github.com/apache/spark/commit/8ee56bbfaacdd64b1712d72650a39939ca3b13f2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22066
thanks, merging to master!
@yucai can you send a new PR for 2.3? We did a lot of codegen changes in 2.4 so it's safer to make sure we pass all the tests in 2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209260954
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2831,4 +2831,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}
}
+
+ test("SPARK-25084: 'distribute by' on multiple columns may lead to codegen issue") {
+ withView("spark_25084") {
+ val count = 1000
--- End diff --
nit: we can probably inline in the next line
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan I am refining and adding tests.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209265323
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -2831,4 +2831,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}
}
+
+ test("SPARK-25084: 'distribute by' on multiple columns may lead to codegen issue") {
+ withView("spark_25084") {
+ val count = 1000
--- End diff --
How to inline? We still use it in the assert.
```
assert(
spark.sql(s"select * from spark_25084 distribute by ($distributeExprs)").count()
=== count)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/22066
> @viirya is that effort going on? I can help with the work if you want. Thanks.
@mgaido91 Yeah, I'm still working on it. One of the PRs #21537 is still waiting for review.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22066
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94557/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22066
cc @jerryshao this is a regression for 2.3, we should have it in 2.3.2
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22066#discussion_r209259163
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -778,21 +783,22 @@ case class HiveHash(children: Seq[Expression]) extends HashExpression[Int] {
input: String,
result: String,
fields: Array[StructField]): String = {
+ val tmpInput = ctx.freshName("input")
--- End diff --
Does Hive have this issue?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/22066
@cloud-fan Synced with @LantaoJin he will help port to 2.3 soon and I will review it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22066
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22066
This PR looks identical to #22067 , which one is the first PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org