You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by stanzhai <gi...@git.apache.org> on 2017/02/28 13:15:14 UTC

[GitHub] spark pull request #17099: Constant alias columns in INNER JOIN should not b...

GitHub user stanzhai opened a pull request:

    https://github.com/apache/spark/pull/17099

    Constant alias columns in INNER JOIN should not be folded by FoldablePropagation rule

    ## What changes were proposed in this pull request?
    This PR fixes the code in Optimizer phase where the constant alias columns of a `INNER JOIN` query are folded in Rule `FoldablePropagation`.
    
    For the following query():
    
    ```
    val sqlA =
      """
        |create temporary view ta as
        |select a, 'a' as tag from t1 union all
        |select a, 'b' as tag from t2
      """.stripMargin
    
    val sqlB =
      """
        |create temporary view tb as
        |select a, 'a' as tag from t3 union all
        |select a, 'b' as tag from t4
      """.stripMargin
    
    val sql =
      """
        |select tb.* from ta inner join tb on
        |ta.a = tb.a and
        |ta.tag = tb.tag
      """.stripMargin
    ```
    
    The tag column is an constant alias column, it's folded by `FoldablePropagation` like this:
    
    ```
    TRACE SparkOptimizer: 
    === Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation ===
     Project [a#4, tag#14]                              Project [a#4, tag#14]
    !+- Join Inner, ((a#0 = a#4) && (tag#8 = tag#14))   +- Join Inner, ((a#0 = a#4) && (a = a))
        :- Union                                           :- Union
        :  :- Project [a#0, a AS tag#8]                    :  :- Project [a#0, a AS tag#8]
        :  :  +- LocalRelation [a#0]                       :  :  +- LocalRelation [a#0]
        :  +- Project [a#2, b AS tag#9]                    :  +- Project [a#2, b AS tag#9]
        :     +- LocalRelation [a#2]                       :     +- LocalRelation [a#2]
        +- Union                                           +- Union
           :- Project [a#4, a AS tag#14]                      :- Project [a#4, a AS tag#14]
           :  +- LocalRelation [a#4]                          :  +- LocalRelation [a#4]
           +- Project [a#6, b AS tag#15]                      +- Project [a#6, b AS tag#15]
              +- LocalRelation [a#6]                             +- LocalRelation [a#6]
    ```
    
    Finally the Result of Batch Operator Optimizations is:
    
    ```
    Project [a#4, tag#14]                              Project [a#4, tag#14]
    !+- Join Inner, ((a#0 = a#4) && (tag#8 = tag#14))   +- Join Inner, (a#0 = a#4)
    !   :- SubqueryAlias ta, `ta`                          :- Union
    !   :  +- Union                                        :  :- LocalRelation [a#0]
    !   :     :- Project [a#0, a AS tag#8]                 :  +- LocalRelation [a#2]
    !   :     :  +- SubqueryAlias t1, `t1`                 +- Union
    !   :     :     +- Project [a#0]                          :- LocalRelation [a#4, tag#14]
    !   :     :        +- SubqueryAlias grouping              +- LocalRelation [a#6, tag#15]
    !   :     :           +- LocalRelation [a#0]        
    !   :     +- Project [a#2, b AS tag#9]              
    !   :        +- SubqueryAlias t2, `t2`              
    !   :           +- Project [a#2]                    
    !   :              +- SubqueryAlias grouping        
    !   :                 +- LocalRelation [a#2]        
    !   +- SubqueryAlias tb, `tb`                       
    !      +- Union                                     
    !         :- Project [a#4, a AS tag#14]             
    !         :  +- SubqueryAlias t3, `t3`              
    !         :     +- Project [a#4]                    
    !         :        +- SubqueryAlias grouping        
    !         :           +- LocalRelation [a#4]        
    !         +- Project [a#6, b AS tag#15]             
    !            +- SubqueryAlias t4, `t4`              
    !               +- Project [a#6]                    
    !                  +- SubqueryAlias grouping        
    !                     +- LocalRelation [a#6]    
    ```
    
    The condition `tag#8 = tag#14` of INNER JOIN has been removed. This leads to the data of inner join being wrong.
    
    After fix:
    
    ```
    === Result of Batch LocalRelation ===
     GlobalLimit 21                                           GlobalLimit 21
     +- LocalLimit 21                                         +- LocalLimit 21
        +- Project [a#4, tag#11]                                 +- Project [a#4, tag#11]
           +- Join Inner, ((a#0 = a#4) && (tag#8 = tag#11))         +- Join Inner, ((a#0 = a#4) && (tag#8 = tag#11))
    !         :- SubqueryAlias ta                                      :- Union
    !         :  +- Union                                              :  :- LocalRelation [a#0, tag#8]
    !         :     :- Project [a#0, a AS tag#8]                       :  +- LocalRelation [a#2, tag#9]
    !         :     :  +- SubqueryAlias t1                             +- Union
    !         :     :     +- Project [a#0]                                :- LocalRelation [a#4, tag#11]
    !         :     :        +- SubqueryAlias grouping                    +- LocalRelation [a#6, tag#12]
    !         :     :           +- LocalRelation [a#0]        
    !         :     +- Project [a#2, b AS tag#9]              
    !         :        +- SubqueryAlias t2                    
    !         :           +- Project [a#2]                    
    !         :              +- SubqueryAlias grouping        
    !         :                 +- LocalRelation [a#2]        
    !         +- SubqueryAlias tb                             
    !            +- Union                                     
    !               :- Project [a#4, a AS tag#11]             
    !               :  +- SubqueryAlias t3                    
    !               :     +- Project [a#4]                    
    !               :        +- SubqueryAlias grouping        
    !               :           +- LocalRelation [a#4]        
    !               +- Project [a#6, b AS tag#12]             
    !                  +- SubqueryAlias t4                    
    !                     +- Project [a#6]                    
    !                        +- SubqueryAlias grouping        
    !                           +- LocalRelation [a#6]  
    ```
    
    ## How was this patch tested?
    
    add sql-tests/inputs/inner-join.sql
    All tests passed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/stanzhai/spark fix-inner-join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17099.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17099
    
----
commit 02b9dd6b8c3eb49bb8e6e537d0432c1eff35baad
Author: Stan Zhai <zh...@haizhi.com>
Date:   2017-02-28T12:44:14Z

    fix inner join

commit 112dd2379bf9febdc9ed81925326b61d2a34efdd
Author: Stan Zhai <zh...@haizhi.com>
Date:   2017-02-28T12:55:33Z

    fix inner-join.sql.out

commit 44636483bb1b87d7e4746ae98df47b3f9dc7e8ce
Author: Stan Zhai <zh...@haizhi.com>
Date:   2017-02-28T13:13:23Z

    update test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73659/testReport)** for PR 17099 at commit [`15fae50`](https://github.com/apache/spark/commit/15fae5029e7c45d3b2a3108e24ca235a6a6fccfc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73594/testReport)** for PR 17099 at commit [`c197b20`](https://github.com/apache/spark/commit/c197b20cc503c8ed6d4a1ac13f015f7c598e3cf5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73588/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103611097
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala ---
    @@ -130,6 +130,20 @@ class FoldablePropagationSuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("Propagate in inner join") {
    +    val ta = testRelation.select('a, Literal("a").as('tag))
    +      .union(testRelation.select('a, Literal("b").as('tag)))
    +      .subquery('ta)
    +    val tb = testRelation.select('a, Literal("a").as('tag))
    +      .union(testRelation.select('a, Literal("b").as('tag)))
    +      .subquery('tb)
    +    val query = ta.join(tb, Inner,
    +      Some("ta.a".attr === "tb.a".attr && "ta.tag".attr === "tb.tag"))
    --- End diff --
    
    -> `Some("ta.a".attr === "tb.a".attr && "ta.tag".attr === "tb.tag".attr))`
    
    Then add the rule [`ConstantFolding` into the test suite.](https://github.com/stanzhai/spark/blob/15fae5029e7c45d3b2a3108e24ca235a6a6fccfc/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala#L31)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103501851
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/inner-join.sql ---
    @@ -0,0 +1,25 @@
    +CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES (1) AS GROUPING(a);
    +CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES (1) AS GROUPING(a);
    +CREATE TEMPORARY VIEW t3 AS SELECT * FROM VALUES (1), (1) AS GROUPING(a);
    +CREATE TEMPORARY VIEW t4 AS SELECT * FROM VALUES (1), (1) AS GROUPING(a);
    +
    +CREATE TEMPORARY VIEW ta AS
    +SELECT a, 'a' AS tag FROM t1
    +UNION ALL
    +SELECT a, 'b' AS tag FROM t2;
    +
    +CREATE TEMPORARY VIEW tb AS
    +SELECT a, 'a' AS tag FROM t3
    +UNION ALL
    +SELECT a, 'b' AS tag FROM t4;
    +
    +-- SPARK-19766 Constant alias columns in INNER JOIN should not be folded by FoldablePropagation rule
    +SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag;
    +
    +-- Clean up
    +DROP VIEW IF EXISTS t1;
    --- End diff --
    
    i don't think you need these drop views, since TEMPORARY VIEW are destroyed immediately after this file.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73594/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73592/testReport)** for PR 17099 at commit [`fc819f8`](https://github.com/apache/spark/commit/fc819f8e4c670d7c31d96be6e628b07b2ebc3509).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    @stanzhai Could you submit another PR to backport it to Spark 2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17099


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Could you add a test case to `FoldablePropagationSuite`? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103457544
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -452,14 +452,6 @@ object FoldablePropagation extends Rule[LogicalPlan] {
             case u: UnaryNode if !stop && canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    -        // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
    -        // of outer join.
    -        case j @ Join(_, _, Inner, _) =>
    --- End diff --
    
    We forgot to check stop here. Can you just change this line into: `case j @ Join(_, _, Inner, _) if !stop =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73659/testReport)** for PR 17099 at commit [`15fae50`](https://github.com/apache/spark/commit/15fae5029e7c45d3b2a3108e24ca235a6a6fccfc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73669/testReport)** for PR 17099 at commit [`df11cc4`](https://github.com/apache/spark/commit/df11cc4d43a587f83fd5650c9129dbd67fb5d3d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103844650
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/inner-join.sql.out ---
    @@ -0,0 +1,68 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 13
    --- End diff --
    
    Actually, this number is wrong. Next time, please do not manually change this file. You should run the command to generate the file. @stanzhai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Thanks! Merging to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73669/testReport)** for PR 17099 at commit [`df11cc4`](https://github.com/apache/spark/commit/df11cc4d43a587f83fd5650c9129dbd67fb5d3d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73659/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103610959
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala ---
    @@ -130,6 +130,20 @@ class FoldablePropagationSuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("Propagate in inner join") {
    +    val ta = testRelation.select('a, Literal("a").as('tag))
    +      .union(testRelation.select('a, Literal("b").as('tag)))
    +      .subquery('ta)
    +    val tb = testRelation.select('a, Literal("a").as('tag))
    +      .union(testRelation.select('a, Literal("b").as('tag)))
    +      .subquery('tb)
    +    val query = ta.join(tb, Inner,
    +      Some("ta.a".attr === "tb.a".attr && "ta.tag".attr === "tb.tag"))
    --- End diff --
    
    This is wrong. What you are doing is to compare the column `ta.tag` with a string constant "tb.tag"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by stanzhai <gi...@git.apache.org>.
Github user stanzhai commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Thanks for @gatorsmile 's help.
    
    `ConstantFolding` will affect other test cases in `FoldablePropagationSuite`.
    
    It's fine without adding `ConstantFolding`.
    
    Before fix:
    ```
    [info]   !'Join Inner, ((a#0 = a#0) && (1 = 1))        'Join Inner, (('tb.a = 'ta.a) && ('tb.tag = 'ta.tag))
    [info]   !:- Union                                     :- 'SubqueryAlias ta
    [info]   !:  :- Project [a#0, 1 AS tag#0]              :  +- 'Union
    [info]   !:  :  +- LocalRelation <empty>, [a#0, b#0]   :     :- 'Project ['a, 1 AS tag#0]
    [info]   !:  +- Project [a#0, 2 AS tag#0]              :     :  +- LocalRelation <empty>, [a#0, b#0]
    [info]   !:     +- LocalRelation <empty>, [a#0, b#0]   :     +- 'Project ['a, 2 AS tag#0]
    [info]   !+- Union                                     :        +- LocalRelation <empty>, [a#0, b#0]
    [info]   !   :- Project [a#0, 1 AS tag#0]              +- 'SubqueryAlias tb
    [info]   !   :  +- LocalRelation <empty>, [a#0, b#0]      +- 'Union
    [info]   !   +- Project [a#0, 2 AS tag#0]                    :- 'Project ['a, 1 AS tag#0]
    [info]   !      +- LocalRelation <empty>, [a#0, b#0]         :  +- LocalRelation <empty>, [a#0, b#0]
    [info]   !                                                   +- 'Project ['a, 2 AS tag#0]
    [info]   !                                                      +- LocalRelation <empty>, [a#0, b#0] (PlanTest.scala:99)
    ```
    
    After fix:
    ```
    [info]   !'Join Inner, ((a#0 = a#0) && (tag#0 = tag#0))   'Join Inner, (('tb.a = 'ta.a) && ('tb.tag = 'ta.tag))
    [info]   !:- Union                                        :- 'SubqueryAlias ta
    [info]   !:  :- Project [a#0, 1 AS tag#0]                 :  +- 'Union
    [info]   !:  :  +- LocalRelation <empty>, [a#0, b#0]      :     :- 'Project ['a, 1 AS tag#0]
    [info]   !:  +- Project [a#0, 2 AS tag#0]                 :     :  +- LocalRelation <empty>, [a#0, b#0]
    [info]   !:     +- LocalRelation <empty>, [a#0, b#0]      :     +- 'Project ['a, 2 AS tag#0]
    [info]   !+- Union                                        :        +- LocalRelation <empty>, [a#0, b#0]
    [info]   !   :- Project [a#0, 1 AS tag#0]                 +- 'SubqueryAlias tb
    [info]   !   :  +- LocalRelation <empty>, [a#0, b#0]         +- 'Union
    [info]   !   +- Project [a#0, 2 AS tag#0]                       :- 'Project ['a, 1 AS tag#0]
    [info]   !      +- LocalRelation <empty>, [a#0, b#0]            :  +- LocalRelation <empty>, [a#0, b#0]
    [info]   !                                                      +- 'Project ['a, 2 AS tag#0]
    [info]   !                                                         +- LocalRelation <empty>, [a#0, b#0] (PlanTest.scala:99)
    ```
    
    I just fix the test case(`"tb.tag" -> "tb.tag".attr`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73588/testReport)** for PR 17099 at commit [`4463648`](https://github.com/apache/spark/commit/44636483bb1b87d7e4746ae98df47b3f9dc7e8ce).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73592/testReport)** for PR 17099 at commit [`fc819f8`](https://github.com/apache/spark/commit/fc819f8e4c670d7c31d96be6e628b07b2ebc3509).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by stanzhai <gi...@git.apache.org>.
Github user stanzhai commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73669/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...

Posted by stanzhai <gi...@git.apache.org>.
Github user stanzhai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17099#discussion_r103848391
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/inner-join.sql.out ---
    @@ -0,0 +1,68 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 13
    --- End diff --
    
    Thanks!
    I will pay attention to this next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73588/testReport)** for PR 17099 at commit [`4463648`](https://github.com/apache/spark/commit/44636483bb1b87d7e4746ae98df47b3f9dc7e8ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    **[Test build #73594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73594/testReport)** for PR 17099 at commit [`c197b20`](https://github.com/apache/spark/commit/c197b20cc503c8ed6d4a1ac13f015f7c598e3cf5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73592/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...

Posted by stanzhai <gi...@git.apache.org>.
Github user stanzhai commented on the issue:

    https://github.com/apache/spark/pull/17099
  
    @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org