You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/16 13:52:25 UTC

[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20276

    [SPARK-14948][SQL] disambiguate attributes in join condition

    ## What changes were proposed in this pull request?
    
    `Dataset.col/apply` returns a column reference, which is pretty useful to deal with duplicated names in join. e.g.
    ```
    val df1 = ... // [a: int, b: int]
    val df2 = ...// [b: int, c: int]
    
    df1.join(df2, df1("b") === df2("b"))
    df1.join(df2).drop(df2("b"))
    ...
    ```
    
    However, this is problematic for self-join, or joining DataFrames derived from the same DataFrame. The reason is that, the column reference returned by `Dataset.col` is actually `AttributeReference`, which means different DataFrames may return same column reference. After join, the right side would be de-duplicated if it has conflicting attributes with the left side, and the column reference returned by right side would be missing after join, or be wrong and refers to columns from the left side.
    
    To fix this issue entirely, we may need to define a real column reference that is globally uique, and design a dataframe lineage mechanism so that we can use column reference from another dataframe in a dataframe operation, e.g.
    ```
    val df3 = df1.join(df2)
    df3.drop(df2("b"))
    ```
    
    This is a lot of work and is too late for 2.3, here I propose a simple and safe solution to disambiguate attributes in join condition only, which is the most common problematic case.
    
    The idea is simple, we assign a globally unique id to each dataframe, via `AnalysisBarrier`. `Dataset.col` returns a special attribute that carries the id of dataframe it comes from. This special attribute is mostly a no-op and will be removed during resolution. It's only used when we are de-duplicating the join right side plan, these special attributes inside join condition would be replaced by the new attributes generated by the right side plan.
    
    ## How was this patch tested?
    
    new regression test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark join-bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20276
    
----
commit dd36ffb520b79c54e3efad9e79a88b3baf4fc985
Author: Wenchen Fan <we...@...>
Date:   2018-01-16T12:26:41Z

    disambiguate attributes in join condition

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86327/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86180/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86324/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86477/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    I'm closing it since `AnalysisBarrier` is no longer there. We should revisit the whole self-join problem and fix it in 3.0, with breaking changes if necessary.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    cc @rxin @gatorsmile @viirya @sameeragarwal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86252/testReport)** for PR 20276 at commit [`57b7c02`](https://github.com/apache/spark/commit/57b7c022d561b12d7cf3ded2605f4f73181c09c3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86481/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Is this only going to fix the case of joining DataFrames derived from the same DataFrame but not for self-joining?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86252/testReport)** for PR 20276 at commit [`57b7c02`](https://github.com/apache/spark/commit/57b7c022d561b12d7cf3ded2605f4f73181c09c3).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AnalysisBarrier(child: LogicalPlan, id: Long) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86275/testReport)** for PR 20276 at commit [`465829d`](https://github.com/apache/spark/commit/465829d8d81bdbac9c01284a32e2c3554ecbda70).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AnalysisBarrier(child: LogicalPlan, id: Long) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86172/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86477/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162262776
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -318,7 +318,10 @@ class Analyzer(
             gid: Expression): Expression = {
           expr transform {
             case e: GroupingID =>
    -          if (e.groupByExprs.isEmpty || e.groupByExprs == groupByExprs) {
    +          def sameExpressions(e1: Seq[Expression], e2: Seq[Expression]): Boolean = {
    --- End diff --
    
    Anyway my PR exposed this bug as now `Dataset.col` returns a slightly different attribute with a metadata.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86220/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan closed the pull request at:

    https://github.com/apache/spark/pull/20276


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86278/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162259637
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -730,12 +733,28 @@ class Analyzer(
               right
             case Some((oldRelation, newRelation)) =>
               val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
    +          // If we de-duplicated an `AnalysisBarrier`, then we should only replace
    +          // `AttributeReference` that refers to this `AnalysisBarrier`.
    +          val barrierId = oldRelation match {
    +            case b: AnalysisBarrier => Some(b.id)
    +            case _ => None
    +          }
               right transformUp {
                 case r if r == oldRelation => newRelation
               } transformUp {
                 case other => other transformExpressions {
    -              case a: Attribute =>
    -                dedupAttr(a, attributeRewrites)
    +              case a: AttributeReference =>
    +                // Only replace `AttributeReference` when the de-duplicated relation is not a
    +                // `AnalysisBarrier`, or this `AttributeReference` is not associated with any
    +                // `AnalysisBarrier`, or this `AttributeReference` refers to the de-duplicated
    +                // `AnalysisBarrier`, i.e. barrierId matches.
    +                if (barrierId.isEmpty || !a.metadata.contains(AnalysisBarrier.metadataKey) ||
    +                  barrierId.get == a.metadata.getLong(AnalysisBarrier.metadataKey)) {
    +                  dedupAttr(a, attributeRewrites)
    --- End diff --
    
    Looks like it is the same as:
    ```scala
    // When we de-duplicated an `AnalysisBarrier` and this `AttributeReference` is associated with other
    // `AnalysisBarrier` different than the de-duplicated one, we don't replace it.
    
    val notToReplace = barrierId.map { id =>
      a.metadata.contains(AnalysisBarrier.metadataKey) &&
        id != a.metadata.getLong(AnalysisBarrier.metadataKey)
    }.getOrElse(false)
    
    if (notToReplace) {
      a
    } else {
      dedupAttr(a, attributeRewrites)
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162431289
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -902,9 +902,20 @@ case class Deduplicate(
      *
      * This analysis barrier will be removed at the end of analysis stage.
      */
    -case class AnalysisBarrier(child: LogicalPlan) extends LeafNode {
    +case class AnalysisBarrier(child: LogicalPlan, id: Long) extends LeafNode {
       override protected def innerChildren: Seq[LogicalPlan] = Seq(child)
       override def output: Seq[Attribute] = child.output
       override def isStreaming: Boolean = child.isStreaming
       override def doCanonicalize(): LogicalPlan = child.canonicalized
    +  override protected def stringArgs: Iterator[Any] = Iterator(child)
    +}
    +
    +object AnalysisBarrier {
    +  private val curId = new java.util.concurrent.atomic.AtomicLong()
    --- End diff --
    
    We need to update the doc of `AnalysisBarrier `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86180/testReport)** for PR 20276 at commit [`ca31ec5`](https://github.com/apache/spark/commit/ca31ec5c9e5a8cb19827cf1b37bbdc4121296faf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86275/testReport)** for PR 20276 at commit [`465829d`](https://github.com/apache/spark/commit/465829d8d81bdbac9c01284a32e2c3554ecbda70).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86275/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162257509
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -318,7 +318,10 @@ class Analyzer(
             gid: Expression): Expression = {
           expr transform {
             case e: GroupingID =>
    -          if (e.groupByExprs.isEmpty || e.groupByExprs == groupByExprs) {
    +          def sameExpressions(e1: Seq[Expression], e2: Seq[Expression]): Boolean = {
    --- End diff --
    
    Is this a bug not related to this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86172/testReport)** for PR 20276 at commit [`3438131`](https://github.com/apache/spark/commit/34381314b7fafc4a1c8ab56bf5f6e6b0a7bc9851).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AttributeWithinAnalysisBarrier(attr: Attribute, id: Long)`
      * `case class AnalysisBarrier(child: LogicalPlan, id: Long) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by ssimeonov <gi...@git.apache.org>.
Github user ssimeonov commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    @cloud-fan do you expect to resolve conflict + merge at some point?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86278/testReport)** for PR 20276 at commit [`b13bae1`](https://github.com/apache/spark/commit/b13bae1c17ce35f1b227221381aa8edf40b21e70).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86220/testReport)** for PR 20276 at commit [`ca31ec5`](https://github.com/apache/spark/commit/ca31ec5c9e5a8cb19827cf1b37bbdc4121296faf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Hi, @cloud-fan . If this PR is still valid, could you resolve the conflicts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86172/testReport)** for PR 20276 at commit [`3438131`](https://github.com/apache/spark/commit/34381314b7fafc4a1c8ab56bf5f6e6b0a7bc9851).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162018508
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1234,11 +1234,24 @@ class Dataset[T] private[sql](
           if (sqlContext.conf.supportQuotedRegexColumnName) {
             colRegex(colName)
           } else {
    -        val expr = resolve(colName)
    -        Column(expr)
    +        createCol(colName)
           }
       }
     
    +  private def createCol(name: String): Column = {
    +    val expr = resolve(name) transform {
    +      case a: AttributeReference =>
    +        // Associate the returned `AttributeReference` with the `AnalysisBarrier` of this Dataset,
    +        // by putting the barrier id into `AttributeReference.metadata`. This information is only
    +        // used to disambiguate the attributes in join condition when resolving self-join and
    +        // de-duplicating the right side plan.
    --- End diff --
    
    Shall we clarify that this metadata will be removed after analysis?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86278/testReport)** for PR 20276 at commit [`b13bae1`](https://github.com/apache/spark/commit/b13bae1c17ce35f1b227221381aa8edf40b21e70).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86387/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86324/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86220/testReport)** for PR 20276 at commit [`ca31ec5`](https://github.com/apache/spark/commit/ca31ec5c9e5a8cb19827cf1b37bbdc4121296faf).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AttributeWithinAnalysisBarrier(attr: Attribute, id: Long)`
      * `case class AnalysisBarrier(child: LogicalPlan, id: Long) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/101/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86477/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86327/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86171/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/37/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86481/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86481/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86324/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86327/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/98/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86252/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r161980194
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
    @@ -76,11 +76,11 @@ class InnerJoinSuite extends SparkPlanTest with SharedSQLContext {
           testName: String,
           leftRows: => DataFrame,
           rightRows: => DataFrame,
    -      condition: () => Expression,
    +      condition: => Expression,
    --- End diff --
    
    not related, but make the code style consistent with other 2 join test suites.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20276#discussion_r162262684
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -318,7 +318,10 @@ class Analyzer(
             gid: Expression): Expression = {
           expr transform {
             case e: GroupingID =>
    -          if (e.groupByExprs.isEmpty || e.groupByExprs == groupByExprs) {
    +          def sameExpressions(e1: Seq[Expression], e2: Seq[Expression]): Boolean = {
    --- End diff --
    
    I think so, basically we should always use `semanticEquals` when matching expressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    **[Test build #86387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86387/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20276
  
    > Is this only going to fix the case of joining DataFrames derived from the same DataFrame but not for self-joining?
    
    Yes. I think self-join is not fixable, `df.join(df, df("id") === (df("id") + 1))`, we have no idea what the join condition means. Maybe we should throw exception for this case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org