You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ankurdave <gi...@git.apache.org> on 2015/10/13 06:42:38 UTC
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
GitHub user ankurdave opened a pull request:
https://github.com/apache/spark/pull/9089
[SPARK-11077] [SQL] Join elimination in Catalyst
Join elimination is a query optimization where certain joins can be eliminated when followed by projections that only keep columns from one side of the join, and when certain columns are known to be unique or foreign keys. This can be very useful for queries involving views and machine-generated queries.
This PR adds join elimination by (1) supporting unique and foreign key hints in logical plans, (2) adding methods in the DataFrame API to let users provide these hints, and (3) adding an optimizer rule that eliminates unique key outer joins and referential integrity joins when followed by an appropriate projection.
This change is described in detail here: https://docs.google.com/document/d/1-YgQSQywHfAo4PhAT-zOOkFZtVcju99h3dYQq-i9GWQ/edit?usp=sharing
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ankurdave/spark SPARK-11077-JoinElimination
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9089.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9089
----
commit 4f528770ecf4a2ae780d6514fdc8c5e7cf899288
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-04T05:33:59Z
Eliminate outer join before project
commit ae46ab0891e974f6491d4b266f08d95d7a1c1382
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-12T20:15:50Z
Use KeyHint to do join elimination
commit df9ef1421cee2f8f94dac24a8116ad504a009a20
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-12T23:25:30Z
Add foreign keys
commit b22f7025860fed1b3f7bd5147691f5ef887bca01
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-13T02:49:26Z
Alias-aware join elimination + bugfixes
commit 9072cb70872b156027cb2e673a397cc01f326128
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-13T03:22:55Z
Propagate foreign keys through Join operator
commit f430ea2c6413879403973fc4fdd4217dde9d27ec
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-13T03:43:06Z
Remove key hints after join elimination
commit 130253101f2db627c42ea4f8759dfeef6c62e574
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-17T01:55:36Z
Support inner joins based on referential integrity
commit 35949f54c53357a86e0a2e2aeb0e5524a8285ce5
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-18T06:38:30Z
Correctness fixes for join elimination
Do not eliminate referential integrity full outer joins, or inner joins where foreign key is
nullable. Require foreign keys to reference unique columns.
commit 945e5231e900621c4a2bbf103816385d68abd5e0
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-19T06:15:31Z
Do key hint resolution during analysis
This is necessary to support aliased self joins and multiple foreign keys with the same referent.
commit 504c9d858b8b35ed788e31bf99fc5f6506be792d
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-19T06:18:02Z
Don't crash when foreign key refers to unresolved relation
Instead just leave the KeyHint unresolved.
commit 83c8ff913dc06f79ce059906e62b0e744967c1e4
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-19T07:42:04Z
Fix JoinEliminationSuite
commit 0b0b8401f97bf52dabacfa818fa62a4477ca4c72
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-19T11:01:43Z
Merge remote-tracking branch 'apache-spark/master' into GraphFrames
commit 9150ddaf2d598314ff3ea1fe4a434de37325d213
Author: Ankur Dave <an...@gmail.com>
Date: 2015-08-19T12:14:53Z
Fix KeyHintSuite after merge
commit 873b3224b043875718959c645146743ed78084da
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T01:47:47Z
In ForeignKey, store referencedRelation as logical plan
Previously we stored its name as part of referencedAttr, requiring a
catalog lookup.
commit 98e0b5e316b1692a188dedc6b49daaa5854a064b
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T02:45:21Z
Use semanticEquals for Attributes
commit d43a2c005b091e571a9d5dc3cc7d22e22a29ffd0
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T03:37:35Z
Remove TODOs
commit f4e7e0140865df27f3c0b000f22d69117316070e
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T04:02:02Z
Add more comments
commit 49b196e041c80c83eef0b069c984e608cc6433b5
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T04:13:46Z
Merge remote-tracking branch 'apache-spark/master' into GraphFrames
commit 578797c456e20d0fb07bf10cb3e64f09065948f9
Author: Ankur Dave <an...@gmail.com>
Date: 2015-10-13T04:38:46Z
Use SharedSQLContext in KeyHintSuite
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147637636
[Test build #43633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/consoleFull) for PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147601433
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147601491
[Test build #43620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/consoleFull) for PR 9089 at commit [`7c7357b`](https://github.com/apache/spark/commit/7c7357bf9c1e8bab3f2d828dd8bc3d6f7d851196).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600353
[Test build #43619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/consoleFull) for PR 9089 at commit [`578797c`](https://github.com/apache/spark/commit/578797c456e20d0fb07bf10cb3e64f09065948f9).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148238916
@jkbradley Oops, thanks for catching that. I introduced it in 50717599f1eb5bf2184a6b1df2e0aebabdebddec because I misunderstood the function of `transformExpressionsDown`. Should be fixed now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148257737
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600237
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147678970
[Test build #43638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/console) for PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148240194
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147646844
[Test build #43638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/consoleFull) for PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148239326
[Test build #43757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/consoleFull) for PR 9089 at commit [`e1ec23d`](https://github.com/apache/spark/commit/e1ec23da83d02adafbe1fdc7852e258f9289d293).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148267193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147643952
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147642886
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154528444
Build finished. 5912 tests run, 0 skipped, 0 failed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147642538
[Test build #43633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/console) for PR 9089 at commit [`55bb135`](https://github.com/apache/spark/commit/55bb1354efcef98944caf96f8d59dc2f4a6459c0).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147636061
@rxin Thanks, I added the Experimental tags.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on a diff in the pull request:
https://github.com/apache/spark/pull/9089#discussion_r41836429
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -362,14 +362,35 @@ class Analyzer(
j
case Some((oldRelation, newRelation)) =>
val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
- val newRight = right transformUp {
- case r if r == oldRelation => newRelation
- } transformUp {
- case other => other transformExpressions {
- case a: Attribute => attributeRewrites.get(a).getOrElse(a)
+ def applyRewrites(plan: LogicalPlan): LogicalPlan =
+ plan transformUp {
+ case r if r == oldRelation => newRelation
+ } transformUp {
+ case other => other transformExpressions {
+ case a: Attribute => attributeRewrites.get(a).getOrElse(a)
+ }
}
- }
- j.copy(right = newRight)
+ val newRight = applyRewrites(right)
+ // Also apply the rewrites to foreign keys on the left side, because these are meant to
+ // reference the right side.
+ val newLeft =
+ if (left.keys.nonEmpty) {
+ left.transform {
+ case KeyHint(keys, child) =>
+ val newKeys = keys.collect {
+ case ForeignKey(attr, referencedRelation, referencedAttr) =>
+ ForeignKey(
+ attr,
+ applyRewrites(referencedRelation),
+ attributeRewrites.get(referencedAttr).getOrElse(referencedAttr))
+ case other => other
+ }
+ KeyHint((keys ++ newKeys).distinct, child)
--- End diff --
Good eye! This is to accommodate future self-joins. If we got rid of the old foreign keys, a future self-join would not recognize that the new keys applied to it, because the attributes would have been rewritten. I just added a comment noting this.
There's [a unit test](https://github.com/apache/spark/pull/9089/files#diff-09ca3beb9c48d89b5fcf248e48d888ddR261) that covers this (fails if you remove the old keys).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600249
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147634467
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/9089#discussion_r41833942
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -362,14 +362,35 @@ class Analyzer(
j
case Some((oldRelation, newRelation)) =>
val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
- val newRight = right transformUp {
- case r if r == oldRelation => newRelation
- } transformUp {
- case other => other transformExpressions {
- case a: Attribute => attributeRewrites.get(a).getOrElse(a)
+ def applyRewrites(plan: LogicalPlan): LogicalPlan =
+ plan transformUp {
+ case r if r == oldRelation => newRelation
+ } transformUp {
+ case other => other transformExpressions {
+ case a: Attribute => attributeRewrites.get(a).getOrElse(a)
+ }
}
- }
- j.copy(right = newRight)
+ val newRight = applyRewrites(right)
+ // Also apply the rewrites to foreign keys on the left side, because these are meant to
+ // reference the right side.
+ val newLeft =
+ if (left.keys.nonEmpty) {
+ left.transform {
+ case KeyHint(keys, child) =>
+ val newKeys = keys.collect {
+ case ForeignKey(attr, referencedRelation, referencedAttr) =>
+ ForeignKey(
+ attr,
+ applyRewrites(referencedRelation),
+ attributeRewrites.get(referencedAttr).getOrElse(referencedAttr))
+ case other => other
+ }
+ KeyHint((keys ++ newKeys).distinct, child)
--- End diff --
Can't we just use `newKeys` here? Why do we need to keep old keys?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154499514
**[Test build #45232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/consoleFull)** for PR 9089 at commit [`5abceae`](https://github.com/apache/spark/commit/5abceaebadedc130feeab7aec8b97f4fac3bdfda).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147642585
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147679090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147622588
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148238867
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147679091
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43638/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148267196
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154528236
**[Test build #45232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/consoleFull)** for PR 9089 at commit [`5abceae`](https://github.com/apache/spark/commit/5abceaebadedc130feeab7aec8b97f4fac3bdfda).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `\n * `sealed abstract class Key `\n * `case class UniqueKey(attr: Attribute) extends Key `\n * `case class ForeignKey(`\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148181317
Calling uniqueKey on a DataFrame throws out the column names. Is that intended?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154498839
Build triggered. sha1 is merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147643826
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147634492
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148786929
@ankurdave Np, thanks for the fix. Btw, should the fix be accompanied by a unit test to catch that issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147642587
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43633/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148257738
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #9089: [SPARK-11077] [SQL] Join elimination in Catalyst
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9089
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154528445
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600552
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600546
[Test build #43619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43619/console) for PR 9089 at commit [`578797c`](https://github.com/apache/spark/commit/578797c456e20d0fb07bf10cb3e64f09065948f9).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148238883
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148241590
[Test build #43758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/consoleFull) for PR 9089 at commit [`0cd8a91`](https://github.com/apache/spark/commit/0cd8a9185e96d0f21a8bd9a437c124566b9f2ce1).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147622583
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147601410
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148266851
[Test build #43758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43758/console) for PR 9089 at commit [`0cd8a91`](https://github.com/apache/spark/commit/0cd8a9185e96d0f21a8bd9a437c124566b9f2ce1).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147621974
[Test build #43620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43620/console) for PR 9089 at commit [`7c7357b`](https://github.com/apache/spark/commit/7c7357bf9c1e8bab3f2d828dd8bc3d6f7d851196).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #9089: [SPARK-11077] [SQL] Join elimination in Catalyst
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/9089
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148257662
[Test build #43757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43757/console) for PR 9089 at commit [`e1ec23d`](https://github.com/apache/spark/commit/e1ec23da83d02adafbe1fdc7852e258f9289d293).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class KeyHint(newKeys: Seq[Key], child: LogicalPlan) extends UnaryNode `
* `sealed abstract class Key `
* `case class UniqueKey(attr: Attribute) extends Key `
* `case class ForeignKey(`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147627183
We can tag them as Experimental (even though the entire DataFrame API is experimental!)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-148240178
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147600550
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-147625512
@marmbrus I addressed your comments from the review about a month ago:
1. Foreign key references now store the referenced relation directly as a logical plan rather than requiring a catalog lookup.
2. We now use `semanticEquals` and `AttributeSet` for attributes instead of normal equality.
There were a few comments that didn't make sense on second thought:
1. Move the attribute equivalence check in `ForeignKeyFinder` to a method on `LogicalPlan`. We thought this would simplify the logic, but it turned out not to (still need to maintain the disjoint-set data structure, and the logic gets split between `LogicalPlan` and `Project`).
2. Move foreign key attribute resolution to its own rule that runs at the end of analysis. This would work fine, but it seems to fit well within `ResolveReferences`.
Finally, the new DataFrame methods should probably be marked as alpha somehow, but I'm not sure of the best way. Maybe a new ScalaDoc group?
cc @rxin, @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9089#issuecomment-154498875
Build started sha1 is merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org