You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/10/29 11:34:53 UTC

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/19603

    [SPARK-22385][SQL] MapObjects should not access list element by index

    ## What changes were proposed in this pull request?
    
    This issue was discovered and investigated by Ohad Raviv and Sean Owen in https://issues.apache.org/jira/browse/SPARK-21657. The input data of `MapObjects` may be a `List` which has O(n) complexity for accessing by index. When converting input data to catalyst array, `MapObjects` gets element by index in each loop, and results to bad performance.
    
    This PR fixes this issue by accessing elements via Iterator.
    
    ## How was this patch tested?
    
    using the test script in https://issues.apache.org/jira/browse/SPARK-21657 
    ```
    val BASE = 100000000
    val N = 100000
    val df = sc.parallelize(List(("1234567890", (BASE to (BASE+N)).map(x => (x.toString, (x+1).toString, (x+2).toString, (x+3).toString)).toList ))).toDF("c1", "c_arr")
    spark.time(df.queryExecution.toRdd.foreach(_ => ()))
    ```
    
    We can see 50x speed up.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark map-objects

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19603
    
----
commit 6cb4fca89e83172407114037d3a447ae6d941f0a
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-10-29T11:27:21Z

    MapObjects should not access list element by index

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83196/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83197/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147581528
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    --- End diff --
    
    Oops, right, not reading this clearly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147587911
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    --- End diff --
    
    otherwise we need a re-sizable array to keep result, which is a lot of change and doesn't have a clear win(re-sizing is expensive).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    cc @srowen @viirya @kiszk 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147581519
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    +          s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
    --- End diff --
    
    I think it is better to leave a comment for code readers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    thanks for the review, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147579883
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    --- End diff --
    
    Also not sure if you want to change this here, but think this is more direct as `cls.isInstanceOf[Seq[_]]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147580056
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    --- End diff --
    
    That's not a big deal, we are traversing the collection from start to end, so either Iterator or accessing by index won't have much difference, may not worth to create 2 branches here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147589361
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    --- End diff --
    
    I see. got it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147579666
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    +          s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
    +          s"$it.next()"
    +        )
           case ObjectType(cls) if cls.isArray =>
    -        s"${genInputData.value}.length" -> s"${genInputData.value}[$loopIndex]"
    +        (
    +          s"${genInputData.value}.length",
    +          "",
    +          s"${genInputData.value}[$loopIndex]"
    +        )
           case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) =>
    --- End diff --
    
    Not sure you want to change this here, or I could be missing something, but really any `java.util.Collection` has `iterator()` and `size()`.  It need not be restrict to `java.util.List`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Good catch! LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147579950
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    --- End diff --
    
    you can use `seqObj.isInstanceOf[Seq[_]]`, but not a `Class`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147589355
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,6 +591,9 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    +    // `MapObjects` generates a while loop to traverse the elements of the input collection. We
    +    // need to take care of Seq and List because they may have O(n) complexity for indexed accessing
    +    // like `list.get(1)`. Here we use Iterator to travers Seq and List.
    --- End diff --
    
    nit: travers -> traverse


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147580091
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    +          s"scala.collection.Iterator $it = ${genInputData.value}.toIterator();",
    +          s"$it.next()"
    +        )
           case ObjectType(cls) if cls.isArray =>
    -        s"${genInputData.value}.length" -> s"${genInputData.value}[$loopIndex]"
    +        (
    +          s"${genInputData.value}.length",
    +          "",
    +          s"${genInputData.value}[$loopIndex]"
    +        )
           case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) =>
    --- End diff --
    
    The legal external typea for catalyst array are scala `Seq` and java `List`, so we don't need to match a more general type here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19603


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)** for PR 19603 at commit [`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147581601
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    +        (
    +          s"${genInputData.value}.size()",
    --- End diff --
    
    Is it OK with us for `size()` to take `O(n)` in some implementations of `Seq`?
    Since this is out of the loop, it is not the worst, but it is not good. It would be better to iterate a loop using `hasNext()`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83188/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147579873
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    --- End diff --
    
    This would cause some types of `Seq` to be traversed by iterator when they support constant-time random access. That's not so bad. But I think you could make a rule for `IndexedSeq` that takes precedence that retains the old behavior. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19603#discussion_r147581548
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -591,18 +591,40 @@ case class MapObjects private(
           case _ => inputData.dataType
         }
     
    -    val (getLength, getLoopVar) = inputDataType match {
    +    val (getLength, prepareLoop, getLoopVar) = inputDataType match {
           case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
    -        s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)"
    +        val it = ctx.freshName("it")
    --- End diff --
    
    I agreed. Currently the branches are a bit complicated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19603
  
    **[Test build #83188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)** for PR 19603 at commit [`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org