You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/10/29 11:34:53 UTC
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/19603
[SPARK-22385][SQL] MapObjects should not access list element by index
## What changes were proposed in this pull request?
This issue was discovered and investigated by Ohad Raviv and Sean Owen in https://issues.apache.org/jira/browse/SPARK-21657. The input data of `MapObjects` may be a `List` which has O(n) complexity for accessing by index. When converting input data to catalyst array, `MapObjects` gets element by index in each loop, and results to bad performance.
This PR fixes this issue by accessing elements via Iterator.
## How was this patch tested?
using the test script in https://issues.apache.org/jira/browse/SPARK-21657
```
val BASE = 100000000
val N = 100000
val df = sc.parallelize(List(("1234567890", (BASE to (BASE+N)).map(x => (x.toString, (x+1).toString, (x+2).toString, (x+3).toString)).toList ))).toDF("c1", "c_arr")
spark.time(df.queryExecution.toRdd.foreach(_ => ()))
```
We can see 50x speed up.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark map-objects
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19603.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19603
----
commit 6cb4fca89e83172407114037d3a447ae6d941f0a
Author: Wenchen Fan <we...@databricks.com>
Date: 2017-10-29T11:27:21Z
MapObjects should not access list element by index
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83196/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83197/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147581528
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
--- End diff --
Oops, right, not reading this clearly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147587911
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
--- End diff --
otherwise we need a re-sizable array to keep result, which is a lot of change and doesn't have a clear win(re-sizing is expensive).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19603
cc @srowen @viirya @kiszk
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147581519
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
+ s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
--- End diff --
I think it is better to leave a comment for code readers.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19603
thanks for the review, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147579883
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
--- End diff --
Also not sure if you want to change this here, but think this is more direct as `cls.isInstanceOf[Seq[_]]`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147580056
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
--- End diff --
That's not a big deal, we are traversing the collection from start to end, so either Iterator or accessing by index won't have much difference, may not worth to create 2 branches here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147589361
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
--- End diff --
I see. got it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147579666
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
+ s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
+ s"$it.next()"
+ )
case ObjectType(cls) if cls.isArray =>
- s"${genInputData.value}.length" -> s"${genInputData.value}[$loopIndex]"
+ (
+ s"${genInputData.value}.length",
+ "",
+ s"${genInputData.value}[$loopIndex]"
+ )
case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) =>
--- End diff --
Not sure you want to change this here, or I could be missing something, but really any `java.util.Collection` has `iterator()` and `size()`. It need not be restrict to `java.util.List`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19603
Good catch! LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147579950
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
--- End diff --
you can use `seqObj.isInstanceOf[Seq[_]]`, but not a `Class`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147589355
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,6 +591,9 @@ case class MapObjects private(
case _ => inputData.dataType
}
+ // `MapObjects` generates a while loop to traverse the elements of the input collection. We
+ // need to take care of Seq and List because they may have O(n) complexity for indexed accessing
+ // like `list.get(1)`. Here we use Iterator to travers Seq and List.
--- End diff --
nit: travers -> traverse
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147580091
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
+ s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
+ s"$it.next()"
+ )
case ObjectType(cls) if cls.isArray =>
- s"${genInputData.value}.length" -> s"${genInputData.value}[$loopIndex]"
+ (
+ s"${genInputData.value}.length",
+ "",
+ s"${genInputData.value}[$loopIndex]"
+ )
case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) =>
--- End diff --
The legal external typea for catalyst array are scala `Seq` and java `List`, so we don't need to match a more general type here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19603
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)** for PR 19603 at commit [`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147581601
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
+ (
+ s"${genInputData.value}.size()",
--- End diff --
Is it OK with us for `size()` to take `O(n)` in some implementations of `Seq`?
Since this is out of the loop, it is not the worst, but it is not good. It would be better to iterate a loop using `hasNext()`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83188/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147579873
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
--- End diff --
This would cause some types of `Seq` to be traversed by iterator when they support constant-time random access. That's not so bad. But I think you could make a rule for `IndexedSeq` that takes precedence that retains the old behavior.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19603
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19603#discussion_r147581548
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
@@ -591,18 +591,40 @@ case class MapObjects private(
case _ => inputData.dataType
}
- val (getLength, getLoopVar) = inputDataType match {
+ val (getLength, prepareLoop, getLoopVar) = inputDataType match {
case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
- s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
+ val it = ctx.freshName("it")
--- End diff --
I agreed. Currently the branches are a bit complicated.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19603
**[Test build #83188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)** for PR 19603 at commit [`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org