You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/11/14 15:20:24 UTC

[GitHub] spark pull request #23035: [SPARK-26054][SQL] Transform also analyzed plans ...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/23035

    [SPARK-26054][SQL] Transform also analyzed plans when dedup references

    ## What changes were proposed in this pull request?
    
    In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans.
    
    ## How was this patch tested?
    
    added UT
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-26057

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23035.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23035
    
----
commit 62a895febb8a91ce578133528a31060fe44981f9
Author: Marco Gaido <ma...@...>
Date:   2018-11-14T15:16:37Z

    [SPARK-26054][SQL] Trasform also analyzed plans when dedup references

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    thanks, merging to master/2.4!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23035: [SPARK-26057][SQL] Transform also analyzed plans ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23035#discussion_r233696401
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2554,4 +2554,34 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-26057: attribute deduplication on already analyzed plans") {
    +    withTempView("cc", "p", "c") {
    +      val df1 = Seq(("1-1", "sp", 6)).toDF("id", "layout", "n")
    +      df1.createOrReplaceTempView("cc")
    +      val df2 = Seq(("sp", 1)).toDF("layout", "ts")
    +      df2.createOrReplaceTempView("p")
    +      val df3 = Seq(("1-1", "sp", 3)).toDF("id", "layout", "ts")
    +      df3.createOrReplaceTempView("c")
    +      spark.sql(
    +        """
    +          |SELECT cc.id, cc.layout, count(*) as m
    +          |FROM cc
    +          |JOIN p USING(layout)
    +          |WHERE EXISTS(
    +          |  SELECT 1
    +          |  FROM c
    +          |  WHERE c.id = cc.id AND c.layout = cc.layout AND c.ts > p.ts)
    +          |GROUP BY cc.id, cc.layout
    +        """.stripMargin).createOrReplaceTempView("pcc")
    +      val res = spark.sql(
    --- End diff --
    
    good catch on the problem! Do you think it's possible to simplify the test? I think we just need a temp view with subquery, and use it in a join.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26054][SQL] Transform also analyzed plans when de...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    cc @cloud-fan @gatorsmile @rxin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26054][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5027/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98861/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    **[Test build #98829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98829/testReport)** for PR 23035 at commit [`62a895f`](https://github.com/apache/spark/commit/62a895febb8a91ce578133528a31060fe44981f9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26054][SQL] Transform also analyzed plans when de...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    **[Test build #98829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98829/testReport)** for PR 23035 at commit [`62a895f`](https://github.com/apache/spark/commit/62a895febb8a91ce578133528a31060fe44981f9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23035: [SPARK-26057][SQL] Transform also analyzed plans ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23035


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5046/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23035: [SPARK-26057][SQL] Transform also analyzed plans ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23035#discussion_r233695765
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2554,4 +2554,34 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-26057: attribute deduplication on already analyzed plans") {
    +    withTempView("cc", "p", "c") {
    --- End diff --
    
    if we don't care about naming, how about `a, b, c` instead of `cc, p, c`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26054][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    **[Test build #98861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98861/testReport)** for PR 23035 at commit [`98d91a3`](https://github.com/apache/spark/commit/98d91a323d891b8995791fc991d80cee9c4f7f97).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23035: [SPARK-26057][SQL] Transform also analyzed plans ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23035#discussion_r233744065
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2554,4 +2554,34 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-26057: attribute deduplication on already analyzed plans") {
    +    withTempView("cc", "p", "c") {
    +      val df1 = Seq(("1-1", "sp", 6)).toDF("id", "layout", "n")
    +      df1.createOrReplaceTempView("cc")
    +      val df2 = Seq(("sp", 1)).toDF("layout", "ts")
    +      df2.createOrReplaceTempView("p")
    +      val df3 = Seq(("1-1", "sp", 3)).toDF("id", "layout", "ts")
    +      df3.createOrReplaceTempView("c")
    +      spark.sql(
    +        """
    +          |SELECT cc.id, cc.layout, count(*) as m
    +          |FROM cc
    +          |JOIN p USING(layout)
    +          |WHERE EXISTS(
    +          |  SELECT 1
    +          |  FROM c
    +          |  WHERE c.id = cc.id AND c.layout = cc.layout AND c.ts > p.ts)
    +          |GROUP BY cc.id, cc.layout
    +        """.stripMargin).createOrReplaceTempView("pcc")
    +      val res = spark.sql(
    --- End diff --
    
    yes, I simplified as much as I was able to. I hope now it is fine. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23035: [SPARK-26057][SQL] Transform also analyzed plans when de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23035
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98829/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org