You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ankurdave <gi...@git.apache.org> on 2015/10/13 06:42:38 UTC

[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

GitHub user ankurdave opened a pull request:

    https://github.com/apache/spark/pull/9089

    [SPARK-11077] [SQL] Join elimination in Catalyst

    Join elimination is a query optimization where certain joins can be eliminated when followed by projections that only keep columns from one side of the join, and when certain columns are known to be unique or foreign keys. This can be very useful for queries involving views and machine-generated queries.
    
    This PR adds join elimination by (1) supporting unique and foreign key hints in logical plans, (2) adding methods in the DataFrame API to let users provide these hints, and (3) adding an optimizer rule that eliminates unique key outer joins and referential integrity joins when followed by an appropriate projection.
    
    This change is described in detail here: https://docs.google.com/document/d/1-YgQSQywHfAo4PhAT-zOOkFZtVcju99h3dYQq-i9GWQ/edit?usp=sharing

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankurdave/spark SPARK-11077-JoinElimination

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9089
    
----
commit 4f528770ecf4a2ae780d6514fdc8c5e7cf899288
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-04T05:33:59Z

    Eliminate outer join before project

commit ae46ab0891e974f6491d4b266f08d95d7a1c1382
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-12T20:15:50Z

    Use KeyHint to do join elimination

commit df9ef1421cee2f8f94dac24a8116ad504a009a20
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-12T23:25:30Z

    Add foreign keys

commit b22f7025860fed1b3f7bd5147691f5ef887bca01
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-13T02:49:26Z

    Alias-aware join elimination + bugfixes

commit 9072cb70872b156027cb2e673a397cc01f326128
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-13T03:22:55Z

    Propagate foreign keys through Join operator

commit f430ea2c6413879403973fc4fdd4217dde9d27ec
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-13T03:43:06Z

    Remove key hints after join elimination

commit 130253101f2db627c42ea4f8759dfeef6c62e574
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-17T01:55:36Z

    Support inner joins based on referential integrity

commit 35949f54c53357a86e0a2e2aeb0e5524a8285ce5
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-18T06:38:30Z

    Correctness fixes for join elimination
    
    Do not eliminate referential integrity full outer joins, or inner joins where foreign key is
    nullable. Require foreign keys to reference unique columns.

commit 945e5231e900621c4a2bbf103816385d68abd5e0
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-19T06:15:31Z

    Do key hint resolution during analysis
    
    This is necessary to support aliased self joins and multiple foreign keys with the same referent.

commit 504c9d858b8b35ed788e31bf99fc5f6506be792d
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-19T06:18:02Z

    Don't crash when foreign key refers to unresolved relation
    
    Instead just leave the KeyHint unresolved.

commit 83c8ff913dc06f79ce059906e62b0e744967c1e4
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-19T07:42:04Z

    Fix JoinEliminationSuite

commit 0b0b8401f97bf52dabacfa818fa62a4477ca4c72
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-19T11:01:43Z

    Merge remote-tracking branch 'apache-spark/master' into GraphFrames

commit 9150ddaf2d598314ff3ea1fe4a434de37325d213
Author: Ankur Dave <an...@gmail.com>
Date:   2015-08-19T12:14:53Z

    Fix KeyHintSuite after merge

commit 873b3224b043875718959c645146743ed78084da
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T01:47:47Z

    In ForeignKey, store referencedRelation as logical plan
    
    Previously we stored its name as part of referencedAttr, requiring a
    catalog lookup.

commit 98e0b5e316b1692a188dedc6b49daaa5854a064b
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T02:45:21Z

    Use semanticEquals for Attributes

commit d43a2c005b091e571a9d5dc3cc7d22e22a29ffd0
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T03:37:35Z

    Remove TODOs

commit f4e7e0140865df27f3c0b000f22d69117316070e
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T04:02:02Z

    Add more comments

commit 49b196e041c80c83eef0b069c984e608cc6433b5
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T04:13:46Z

    Merge remote-tracking branch 'apache-spark/master' into GraphFrames

commit 578797c456e20d0fb07bf10cb3e64f09065948f9
Author: Ankur Dave <an...@gmail.com>
Date:   2015-10-13T04:38:46Z

    Use SharedSQLContext in KeyHintSuite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147637636
  
      [Test build #43633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/consoleFull) for   PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147601433
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147601491
  
      [Test build #43620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/consoleFull) for   PR 9089 at commit [`7c7357b`](https://github.com/apache/spark/commit/7c7357bf9c1e8bab3f2d828dd8bc3d6f7d851196).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600353
  
      [Test build #43619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/consoleFull) for   PR 9089 at commit [`578797c`](https://github.com/apache/spark/commit/578797c456e20d0fb07bf10cb3e64f09065948f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148238916
  
    @jkbradley Oops, thanks for catching that. I introduced it in 50717599f1eb5bf2184a6b1df2e0aebabdebddec because I misunderstood the function of `transformExpressionsDown`. Should be fixed now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148257737
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600237
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147678970
  
      [Test build #43638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/console) for   PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148240194
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147646844
  
      [Test build #43638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/consoleFull) for   PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148239326
  
      [Test build #43757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/consoleFull) for   PR 9089 at commit [`e1ec23d`](https://github.com/apache/spark/commit/e1ec23da83d02adafbe1fdc7852e258f9289d293).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148267193
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147643952
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147642886
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154528444
  
    Build finished. 5912 tests run, 0 skipped, 0 failed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147642538
  
      [Test build #43633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/console) for   PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147636061
  
    @rxin Thanks, I added the Experimental tags.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9089#discussion_r41836429
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -362,14 +362,35 @@ class Analyzer(
                 j
               case Some((oldRelation, newRelation)) =>
                 val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
    -            val newRight = right transformUp {
    -              case r if r == oldRelation => newRelation
    -            } transformUp {
    -              case other => other transformExpressions {
    -                case a: Attribute => attributeRewrites.get(a).getOrElse(a)
    +            def applyRewrites(plan: LogicalPlan): LogicalPlan =
    +              plan transformUp {
    +                case r if r == oldRelation => newRelation
    +              } transformUp {
    +                case other => other transformExpressions {
    +                  case a: Attribute => attributeRewrites.get(a).getOrElse(a)
    +                }
                   }
    -            }
    -            j.copy(right = newRight)
    +            val newRight = applyRewrites(right)
    +            // Also apply the rewrites to foreign keys on the left side, because these are meant to
    +            // reference the right side.
    +            val newLeft =
    +              if (left.keys.nonEmpty) {
    +                left.transform {
    +                  case KeyHint(keys, child) =>
    +                    val newKeys = keys.collect {
    +                      case ForeignKey(attr, referencedRelation, referencedAttr) =>
    +                        ForeignKey(
    +                          attr,
    +                          applyRewrites(referencedRelation),
    +                          attributeRewrites.get(referencedAttr).getOrElse(referencedAttr))
    +                      case other => other
    +                    }
    +                    KeyHint((keys ++ newKeys).distinct, child)
    --- End diff --
    
    Good eye! This is to accommodate future self-joins. If we got rid of the old foreign keys, a future self-join would not recognize that the new keys applied to it, because the attributes would have been rewritten.  I just added a comment noting this.
    
    There's [a unit test](https://github.com/apache/spark/pull/9089/files#diff-09ca3beb9c48d89b5fcf248e48d888ddR261) that covers this (fails if you remove the old keys).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600249
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147634467
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9089#discussion_r41833942
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -362,14 +362,35 @@ class Analyzer(
                 j
               case Some((oldRelation, newRelation)) =>
                 val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
    -            val newRight = right transformUp {
    -              case r if r == oldRelation => newRelation
    -            } transformUp {
    -              case other => other transformExpressions {
    -                case a: Attribute => attributeRewrites.get(a).getOrElse(a)
    +            def applyRewrites(plan: LogicalPlan): LogicalPlan =
    +              plan transformUp {
    +                case r if r == oldRelation => newRelation
    +              } transformUp {
    +                case other => other transformExpressions {
    +                  case a: Attribute => attributeRewrites.get(a).getOrElse(a)
    +                }
                   }
    -            }
    -            j.copy(right = newRight)
    +            val newRight = applyRewrites(right)
    +            // Also apply the rewrites to foreign keys on the left side, because these are meant to
    +            // reference the right side.
    +            val newLeft =
    +              if (left.keys.nonEmpty) {
    +                left.transform {
    +                  case KeyHint(keys, child) =>
    +                    val newKeys = keys.collect {
    +                      case ForeignKey(attr, referencedRelation, referencedAttr) =>
    +                        ForeignKey(
    +                          attr,
    +                          applyRewrites(referencedRelation),
    +                          attributeRewrites.get(referencedAttr).getOrElse(referencedAttr))
    +                      case other => other
    +                    }
    +                    KeyHint((keys ++ newKeys).distinct, child)
    --- End diff --
    
    Can't we just use `newKeys` here? Why do we need to keep old keys?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154499514
  
    **[Test build #45232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/consoleFull)** for PR 9089 at commit [`5abceae`](https://github.com/apache/spark/commit/5abceaebadedc130feeab7aec8b97f4fac3bdfda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147642585
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147679090
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147622588
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148238867
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147679091
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148267196
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154528236
  
    **[Test build #45232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/consoleFull)** for PR 9089 at commit [`5abceae`](https://github.com/apache/spark/commit/5abceaebadedc130feeab7aec8b97f4fac3bdfda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `\n  * `sealed abstract class Key `\n  * `case class UniqueKey(attr: Attribute) extends Key `\n  * `case class ForeignKey(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148181317
  
    Calling uniqueKey on a DataFrame throws out the column names.  Is that intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154498839
  
    Build triggered. sha1 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147643826
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147634492
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148786929
  
    @ankurdave  Np, thanks for the fix.  Btw, should the fix be accompanied by a unit test to catch that issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147642587
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148257738
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9089: [SPARK-11077] [SQL] Join elimination in Catalyst

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9089


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154528445
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600552
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600546
  
      [Test build #43619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/console) for   PR 9089 at commit [`578797c`](https://github.com/apache/spark/commit/578797c456e20d0fb07bf10cb3e64f09065948f9).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148238883
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148241590
  
      [Test build #43758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/consoleFull) for   PR 9089 at commit [`0cd8a91`](https://github.com/apache/spark/commit/0cd8a9185e96d0f21a8bd9a437c124566b9f2ce1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147622583
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147601410
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148266851
  
      [Test build #43758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/console) for   PR 9089 at commit [`0cd8a91`](https://github.com/apache/spark/commit/0cd8a9185e96d0f21a8bd9a437c124566b9f2ce1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147621974
  
      [Test build #43620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/console) for   PR 9089 at commit [`7c7357b`](https://github.com/apache/spark/commit/7c7357bf9c1e8bab3f2d828dd8bc3d6f7d851196).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9089: [SPARK-11077] [SQL] Join elimination in Catalyst

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/9089
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148257662
  
      [Test build #43757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/console) for   PR 9089 at commit [`e1ec23d`](https://github.com/apache/spark/commit/e1ec23da83d02adafbe1fdc7852e258f9289d293).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
      * `sealed abstract class Key `
      * `case class UniqueKey(attr: Attribute) extends Key `
      * `case class ForeignKey(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147627183
  
    We can tag them as Experimental (even though the entire DataFrame API is experimental!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-148240178
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147600550
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-147625512
  
    @marmbrus I addressed your comments from the review about a month ago:
    
    1. Foreign key references now store the referenced relation directly as a logical plan rather than requiring a catalog lookup.
    2. We now use `semanticEquals` and `AttributeSet` for attributes instead of normal equality.
    
    There were a few comments that didn't make sense on second thought:
    
    1. Move the attribute equivalence check in `ForeignKeyFinder` to a method on `LogicalPlan`. We thought this would simplify the logic, but it turned out not to (still need to maintain the disjoint-set data structure, and the logic gets split between `LogicalPlan` and `Project`).
    2. Move foreign key attribute resolution to its own rule that runs at the end of analysis. This would work fine, but it seems to fit well within `ResolveReferences`.
    
    Finally, the new DataFrame methods should probably be marked as alpha somehow, but I'm not sure of the best way. Maybe a new ScalaDoc group?
    
    cc @rxin, @jkbradley



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9089#issuecomment-154498875
  
    Build started sha1 is merged.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org