You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2017/08/16 07:44:58 UTC

[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/18959

    [SPARK-18394][SQL] Make an Attribute.toSeq output order consistent 

    ## What changes were proposed in this pull request?
    This pr sorted output attributes on their name and exprId in `Attribute.toSeq` to make the order consistent.  If the order is different, spark possibly generates different code and then misses cache in `CodeGenerator`, e.g., `GenerateColumnAccessor` generates code depending on an input attribute order.
    
    ## How was this patch tested?
    Added tests in `AttributeSetSuite` and manually checked if the cache worked well in the given query of the JIRA.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-18394

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18959
    
----
commit 3201f0a4a6ee9421dbe50e6e3195a73d35d3ef20
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-08-16T02:49:17Z

    Keep a deterministic output order in Attribute.toSeq

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80763/testReport)** for PR 18959 at commit [`b33fde8`](https://github.com/apache/spark/commit/b33fde86ecd6a0be5f4a55c408ab10e0ac44101a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133383172
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -121,7 +121,12 @@ class AttributeSet private (val baseSet: Set[AttributeEquals])
     
       // We must force toSeq to not be strict otherwise we end up with a [[Stream]] that captures all
       // sorts of things in its closure.
    -  override def toSeq: Seq[Attribute] = baseSet.map(_.a).toArray.toSeq
    +  override def toSeq: Seq[Attribute] = {
    +    // We need to keep a deterministic output order for `baseSet` because this affects a variable
    +    // order in generated code (e.g., `GenerateColumnAccessor`).
    +    // See SPARK-18394 for details.
    +    baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }
    --- End diff --
    
    Yea, as you suggested, I initially did so. But, I just kept [the original code](https://github.com/apache/spark/commit/c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7#diff-75576f0ec7f9d8b5032000245217d233R105) cuz I was afraid this change wrongly affected the others. cc: @marmbrus


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80781/testReport)** for PR 18959 at commit [`973402b`](https://github.com/apache/spark/commit/973402bd822c05f8895405fbcaf918edbaad9d23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    In title and description, `Attribute.toSeq` seems to be `AttributeSet.toSeq`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80781/testReport)** for PR 18959 at commit [`973402b`](https://github.com/apache/spark/commit/973402bd822c05f8895405fbcaf918edbaad9d23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80736/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80763/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80770/testReport)** for PR 18959 at commit [`973402b`](https://github.com/apache/spark/commit/973402bd822c05f8895405fbcaf918edbaad9d23).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80761/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80781/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80770/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80736/testReport)** for PR 18959 at commit [`eba844e`](https://github.com/apache/spark/commit/eba844e5fa5ee1f4faa59770112a61e8f5493830).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133628398
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala ---
    @@ -162,7 +162,13 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter {
           }.head
     
           assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch")
    -      assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch")
    +
    +      // Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method
    +      // in `SparkPlanner` that internally uses `AttributeSet.toSeq`.
    +      // Since we change an output order of `AttributeSet.toSeq` in SPARK-18394,
    +      // we need to sort column names for a test below.
    --- End diff --
    
    How about?
    
    >  Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method in `SparkPlanner`. This method internally uses `AttributeSet.toSeq`, in which the returned output columns are sorted by the names and expression ids.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18959


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80763/testReport)** for PR 18959 at commit [`b33fde8`](https://github.com/apache/spark/commit/b33fde86ecd6a0be5f4a55c408ab10e0ac44101a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    LGTM except two minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80736/testReport)** for PR 18959 at commit [`eba844e`](https://github.com/apache/spark/commit/eba844e5fa5ee1f4faa59770112a61e8f5493830).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133532316
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala ---
    @@ -162,7 +162,8 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter {
           }.head
     
           assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch")
    -      assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch")
    +      assert(actualScannedColumns.sorted === expectedScannedColumns.sorted,
    --- End diff --
    
    Could you add a comment to explain where we call `AttributeSet.toSeq`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133446082
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -121,7 +121,12 @@ class AttributeSet private (val baseSet: Set[AttributeEquals])
     
       // We must force toSeq to not be strict otherwise we end up with a [[Stream]] that captures all
       // sorts of things in its closure.
    -  override def toSeq: Seq[Attribute] = baseSet.map(_.a).toArray.toSeq
    +  override def toSeq: Seq[Attribute] = {
    +    // We need to keep a deterministic output order for `baseSet` because this affects a variable
    +    // order in generated code (e.g., `GenerateColumnAccessor`).
    +    // See SPARK-18394 for details.
    +    baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }
    --- End diff --
    
    ok, I fix in that way. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133379380
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -121,7 +121,12 @@ class AttributeSet private (val baseSet: Set[AttributeEquals])
     
       // We must force toSeq to not be strict otherwise we end up with a [[Stream]] that captures all
       // sorts of things in its closure.
    -  override def toSeq: Seq[Attribute] = baseSet.map(_.a).toArray.toSeq
    +  override def toSeq: Seq[Attribute] = {
    +    // We need to keep a deterministic output order for `baseSet` because this affects a variable
    +    // order in generated code (e.g., `GenerateColumnAccessor`).
    +    // See SPARK-18394 for details.
    +    baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }
    --- End diff --
    
    If it needs to be a Seq, then should the `toArray` be `toSeq`? maybe I missed way it has to be an array first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133445750
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -121,7 +121,12 @@ class AttributeSet private (val baseSet: Set[AttributeEquals])
     
       // We must force toSeq to not be strict otherwise we end up with a [[Stream]] that captures all
       // sorts of things in its closure.
    -  override def toSeq: Seq[Attribute] = baseSet.map(_.a).toArray.toSeq
    +  override def toSeq: Seq[Attribute] = {
    +    // We need to keep a deterministic output order for `baseSet` because this affects a variable
    +    // order in generated code (e.g., `GenerateColumnAccessor`).
    +    // See SPARK-18394 for details.
    +    baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }
    --- End diff --
    
    I though `map` should always return a strict collection. I think it is safe to sort immediately after that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80725/testReport)** for PR 18959 at commit [`3201f0a`](https://github.com/apache/spark/commit/3201f0a4a6ee9421dbe50e6e3195a73d35d3ef20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    oh..ya, my bad.... thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80725/testReport)** for PR 18959 at commit [`3201f0a`](https://github.com/apache/spark/commit/3201f0a4a6ee9421dbe50e6e3195a73d35d3ef20).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133629453
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala ---
    @@ -162,7 +162,13 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter {
           }.head
     
           assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch")
    -      assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch")
    +
    +      // Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method
    +      // in `SparkPlanner` that internally uses `AttributeSet.toSeq`.
    +      // Since we change an output order of `AttributeSet.toSeq` in SPARK-18394,
    +      // we need to sort column names for a test below.
    --- End diff --
    
    look good, I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133531947
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AttributeSetSuite.scala ---
    @@ -78,4 +78,34 @@ class AttributeSetSuite extends SparkFunSuite {
         assert(aSet == aSet)
         assert(aSet == AttributeSet(aUpper :: Nil))
       }
    +
    +  test("SPARK-18394 keep a deterministic output order along with attribute names") {
    --- End diff --
    
    Modify this test case. Add a scenario in which the attribute set has two columns have the same name but different ids? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an Attribute.toSeq output order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80725/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18959#discussion_r133601515
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AttributeSetSuite.scala ---
    @@ -78,4 +78,34 @@ class AttributeSetSuite extends SparkFunSuite {
         assert(aSet == aSet)
         assert(aSet == AttributeSet(aUpper :: Nil))
       }
    +
    +  test("SPARK-18394 keep a deterministic output order along with attribute names") {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    **[Test build #80770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80770/testReport)** for PR 18959 at commit [`973402b`](https://github.com/apache/spark/commit/973402bd822c05f8895405fbcaf918edbaad9d23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/18959
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org