You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/08/23 09:53:17 UTC
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/22198
[SPARK-25121][SQL] Supports multi-part table names for broadcast hint resolution
## What changes were proposed in this pull request?
This pr fixed code to respect a database name for broadcast table hint resolution.
Currently, spark ignores a database name in multi-part names;
```
scala> sql("CREATE DATABASE testDb")
scala> spark.range(10).write.saveAsTable("testDb.t")
// without this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#24L]
+- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
:- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
: +- *(1) Range (0, 10, step=1, splits=4)
+- *(2) Project [id#26L]
+- *(2) Filter isnotnull(id#26L)
+- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
// with this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#3L]
+- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
:- *(2) Range (0, 10, step=1, splits=4)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
+- *(1) Project [id#5L]
+- *(1) Filter isnotnull(id#5L)
+- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
```
## How was this patch tested?
Added tests in `DataFrameJoinSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-25121
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22198.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22198
----
commit d2be6920ba1cc052e9d5d8364cf48375cea8ba44
Author: Takeshi Yamamuro <ya...@...>
Date: 2018-08-23T07:20:51Z
Fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95281/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
@maropu . Did we build the consensus? For me, I'm at the same position until now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96505/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3422/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95279/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r213061786
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
--- End diff --
Ya. Right, please ignore this. We need `catalog` to lookup `global_temp`, too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212850203
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String]): Seq[String] = {
+ if (nameParts.size == 1) {
+ catalog.getCurrentDatabase +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ val identifierList =
+ tableIdent.database.getOrElse(catalog.getCurrentDatabase) :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts).corresponds(identifierList)(resolver)
--- End diff --
@dongjoon-hyun a little confused about the name resolution here;
```
"SELECT /*+ MAPJOIN(v1) */ * FROM global_temp.v1, v2 WHERE v1.id = v2.id",
```
`MAPJOIN(v1)` implicitly means `global_temp.v1`?
For example;
```
"SELECT /*+ MAPJOIN(v1) */ * FROM default.v1, global_temp.v1 WHERE default.v1.id = global_temp.v1.id",
```
In this case, what's the `MAPJOIN(v1)` behaviour?
- 1. Apply no hint
- 2. Apply a hint into `default.v1` only
- 3. Apply a hint into both
WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95252/testReport)** for PR 22198 at commit [`b5f9584`](https://github.com/apache/spark/commit/b5f9584de2a1a8c4f943ed573a7e24775c918b4e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95148/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96540/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5653/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97386/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95284/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96540/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3977/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212861566
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String]): Seq[String] = {
+ if (nameParts.size == 1) {
+ catalog.getCurrentDatabase +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ val identifierList =
+ tableIdent.database.getOrElse(catalog.getCurrentDatabase) :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts).corresponds(identifierList)(resolver)
--- End diff --
oh, yes. I'll refine the pr. thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96431/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96431/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22198
@dongjoon-hyun Thanks for nicely summarizing. Actually i was not clear on the semantics when i asked the question :-) and was wondering if we should resolve it like a table identifier or just match it like a string. Do we know how other databases that support hints handle this ? I am actually fine if we go with option 1.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
Thanks, @dongjoon-hyun! I'll check and merge that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212851693
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
--- End diff --
We can instead use `getCurrentDatabase: () => String`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96562/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r213197519
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,47 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String], database: String): Seq[String] = {
+ if (nameParts.size == 1) {
+ database +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ tableIdent.database match {
+ case Some(db) if resolver(catalog.globalTempViewManager.database, db) =>
+ val identifierList = db :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts, catalog.globalTempViewManager.database)
+ .corresponds(identifierList)(resolver)
+ case _ =>
+ val db = tableIdent.database.getOrElse(catalog.getCurrentDatabase)
+ val identifierList = db :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts, catalog.getCurrentDatabase)
--- End diff --
ok, I'll try.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212843948
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -191,6 +195,48 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
assert(plan2.collect { case p: BroadcastHashJoinExec => p }.size == 1)
}
+ test("SPARK-25121 Supports multi-part names for broadcast hint resolution") {
+ val (table1Name, table2Name) = ("t1", "t2")
+ withTempDatabase { dbName =>
+ withTable(table1Name, table2Name) {
+ withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+ spark.range(50).write.saveAsTable(s"$dbName.$table1Name")
+ spark.range(100).write.saveAsTable(s"$dbName.$table2Name")
+ // First, makes sure a join is not broadcastable
+ val plan = sql(s"SELECT * FROM $dbName.$table1Name, $dbName.$table2Name " +
+ s"WHERE $table1Name.id = $table2Name.id")
+ .queryExecution.executedPlan
+ assert(plan.collect { case p: BroadcastHashJoinExec => p }.size == 0)
+
+ // Uses multi-part table names for broadcast hints
+ def checkIfHintApplied(tableName: String, hintTableName: String): Unit = {
--- End diff --
yea, I'll fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22198
@maropu @gatorsmile I do have a question on the semantics.
```SQL
use hint;
explain extended SELECT /*+ BROADCASTJOIN(hint.s2) */ * FROM s1, s2 where s1.c1 = s2.c1;
```
In this case, aren't we supposed to apply the hint ? even though s2 is not explicitly qualified with the database ? Here is the optimized plan i see ..
```
*(5) SortMergeJoin [c1#30], [c1#32], Inner
:- *(2) Sort [c1#30 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(c1#30, 200)
: +- *(1) Filter isnotnull(c1#30)
: +- HiveTableScan [c1#30, c2#31], HiveTableRelation `hint`.`s1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
+- *(4) Sort [c1#32 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(c1#32, 200)
+- *(3) Filter isnotnull(c1#32)
+- HiveTableScan [c1#32, c2#33], HiveTableRelation `hint`.`s2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#32, c2#33]
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)** for PR 22198 at commit [`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r213193232
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,47 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String], database: String): Seq[String] = {
+ if (nameParts.size == 1) {
+ database +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ tableIdent.database match {
+ case Some(db) if resolver(catalog.globalTempViewManager.database, db) =>
+ val identifierList = db :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts, catalog.globalTempViewManager.database)
+ .corresponds(identifierList)(resolver)
+ case _ =>
+ val db = tableIdent.database.getOrElse(catalog.getCurrentDatabase)
+ val identifierList = db :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts, catalog.getCurrentDatabase)
--- End diff --
This part will break `temporary view` case. In the following case, no table should be broadcasted. Also, could you add more test cases? We need to test `table`, `global temporary view`, `temporary view`, and `view`. We may miss some cases until now like the following.
```scala
scala> :paste
// Entering paste mode (ctrl-D to finish)
sql("set spark.sql.autoBroadcastJoinThreshold=-1")
spark.range(10).write.mode("overwrite").saveAsTable("t")
sql("create temporary view tv as select * from t")
sql("select /*+ mapjoin(default.tv) */ * from t, tv where t.id = tv.id").explain
sql("select * from default.tv")
// Exiting paste mode, now interpreting.
== Physical Plan ==
*(2) BroadcastHashJoin [id#7L], [id#12L], Inner, BuildRight
:- *(2) Project [id#7L]
: +- *(2) Filter isnotnull(id#7L)
: +- *(2) FileScan parquet default.t[id#7L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/dongjoon/PR-22198/spark-warehouse/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
+- *(1) Project [id#12L]
+- *(1) Filter isnotnull(id#12L)
+- *(1) FileScan parquet default.t[id#12L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/dongjoon/PR-22198/spark-warehouse/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
org.apache.spark.sql.AnalysisException: Table or view not found: `default`.`tv`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `default`.`tv`
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)** for PR 22198 at commit [`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96505/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2565/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95281/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95148/testReport)** for PR 22198 at commit [`d2be692`](https://github.com/apache/spark/commit/d2be6920ba1cc052e9d5d8364cf48375cea8ba44).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2597/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
aha, I see. IMO we need to apply the hint in the case, too. I'll fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212844390
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -144,7 +144,7 @@ class Analyzer(
lazy val batches: Seq[Batch] = Seq(
Batch("Hints", fixedPoint,
- new ResolveHints.ResolveBroadcastHints(conf),
+ new ResolveHints.ResolveBroadcastHints(conf, catalog),
--- End diff --
aha, ok.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96540/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3441/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2653/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95252/testReport)** for PR 22198 at commit [`b5f9584`](https://github.com/apache/spark/commit/b5f9584de2a1a8c4f943ed573a7e24775c918b4e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* ` class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2691/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
@maropu . I made a PR (https://github.com/maropu/spark/pull/2) for the new test cases. You can review and merge that if you want. `GlobalTempViewSuite.scala` will start to fail after merging.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r213093040
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,51 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String], database: String): Seq[String] = {
+ if (nameParts.size == 1) {
+ database +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def formatDatabaseName(name: String): String = {
+ if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ tableIdent.database match {
+ case Some(db) if catalog.globalTempViewManager.database == formatDatabaseName(db) =>
--- End diff --
Also, we need a case-sensitive test. I made another PR to you for that, https://github.com/maropu/spark/pull/3 .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212811422
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String]): Seq[String] = {
+ if (nameParts.size == 1) {
+ catalog.getCurrentDatabase +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ val identifierList =
+ tableIdent.database.getOrElse(catalog.getCurrentDatabase) :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts).corresponds(identifierList)(resolver)
--- End diff --
This logic will make a regression (`plan1` in the below) in case of global temporary view. Please add the following test case into `GlobalTempViewSuite` and revise the logic to handle both cases correctly.
```scala
test("broadcast hint on global temp view") {
import org.apache.spark.sql.catalyst.plans.logical.{ResolvedHint, Join}
spark.range(10).createGlobalTempView("v1")
spark.range(10).createOrReplaceTempView("v2")
val plan1 = sql(s"SELECT /*+ BROADCAST(v1) */ * FROM global_temp.v1, v2 WHERE v1.id = v2.id")
.queryExecution.optimizedPlan
assert(plan1.asInstanceOf[Join].left.isInstanceOf[ResolvedHint])
assert(!plan1.asInstanceOf[Join].right.isInstanceOf[ResolvedHint])
val plan2 = sql(
s"SELECT /*+ BROADCAST(global_temp.v1) */ * FROM global_temp.v1, v2 WHERE v1.id = v2.id")
.queryExecution.optimizedPlan
assert(plan2.asInstanceOf[Join].left.isInstanceOf[ResolvedHint])
assert(!plan2.asInstanceOf[Join].right.isInstanceOf[ResolvedHint])
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95275/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
ping
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #99594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99594/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212822215
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -191,6 +195,48 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
assert(plan2.collect { case p: BroadcastHashJoinExec => p }.size == 1)
}
+ test("SPARK-25121 Supports multi-part names for broadcast hint resolution") {
+ val (table1Name, table2Name) = ("t1", "t2")
+ withTempDatabase { dbName =>
+ withTable(table1Name, table2Name) {
+ withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+ spark.range(50).write.saveAsTable(s"$dbName.$table1Name")
+ spark.range(100).write.saveAsTable(s"$dbName.$table2Name")
+ // First, makes sure a join is not broadcastable
+ val plan = sql(s"SELECT * FROM $dbName.$table1Name, $dbName.$table2Name " +
+ s"WHERE $table1Name.id = $table2Name.id")
+ .queryExecution.executedPlan
+ assert(plan.collect { case p: BroadcastHashJoinExec => p }.size == 0)
+
+ // Uses multi-part table names for broadcast hints
+ def checkIfHintApplied(tableName: String, hintTableName: String): Unit = {
--- End diff --
`hintTableName` is never used in this func?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95448/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95448/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95279/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99594/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
Thanks for the sum-up. I like simpler one, too. Le me just describe more to make me more understood; IIUC we have the two case: (1) fully-qualified case `/*+ MAPJOIN(testDb.t) */` and (2) non-qualified case `/*+ MAPJOIN(t) */`.
(1) no ambiguity, as @dongjoon-hyun said, we just exactly map `testDb.t` to `testDb.t`
(2) the case @dongjoon-hyun worry about ,right?
Since I think most users meet this case (2) (they don't add database names there in most case I probably think...), IMHO it is important to support the syntax for usability. Based on the thought, my proposal is that we handle `/*+ MAPJOIN(t) */` as `/*+ MAPJOIN(*.t) */`to pick up all the matched tables in views and tables. Then, we print warning message for users like this: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Column.scala#L265
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95284/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
Aha, I see. It is simple to match identifiers literally. So, let me wait for other developers comments. cc: @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212852643
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String]): Seq[String] = {
+ if (nameParts.size == 1) {
+ catalog.getCurrentDatabase +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ val identifierList =
+ tableIdent.database.getOrElse(catalog.getCurrentDatabase) :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts).corresponds(identifierList)(resolver)
--- End diff --
1) First of all, the above two test cases should work as before. `global_temp.v1` should be used with the prefix `global_temp.`. Before this PR, we cannot put `database` name on Hint. So, we allowed exceptional cases; hints on global temporary view (without `global_temp.` prefix).
2) For the case you mentioned, I'd like to interpret `MAPJOIN(v1)` to `default.v1 only` because it's the Spark's behavior outside this Hint syntax. And, please add a test case for this, too.
@cloud-fan and @gatorsmile . Could you give us some advice, too? Is it okay to you?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212844578
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
--- End diff --
I think we can't use `String` there because `currentDatabase` can be updatable by others?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #97386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97386/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2561/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22198
cc @dongjoon-hyun Try to review this PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95275/testReport)** for PR 22198 at commit [`5277563`](https://github.com/apache/spark/commit/52775631eb233dc8ec1e1d768d8c1f1709e1193a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4230/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r213086884
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,51 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String], database: String): Seq[String] = {
+ if (nameParts.size == 1) {
+ database +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def formatDatabaseName(name: String): String = {
+ if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ tableIdent.database match {
+ case Some(db) if catalog.globalTempViewManager.database == formatDatabaseName(db) =>
--- End diff --
Could you use `resolver` here like the following and remove `formatDatabaseName` in line 65~67? Since it's a `SessionCatalog` function, let's avoid duplication.
```scala
- case Some(db) if catalog.globalTempViewManager.database == formatDatabaseName(db) =>
+ case Some(db) if resolver(catalog.globalTempViewManager.database, db) =>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95275/testReport)** for PR 22198 at commit [`5277563`](https://github.com/apache/spark/commit/52775631eb233dc8ec1e1d768d8c1f1709e1193a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95322/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212807800
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -191,6 +195,39 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
assert(plan2.collect { case p: BroadcastHashJoinExec => p }.size == 1)
}
+ test("SPARK-25121 Supports multi-part names for broadcast hint resolution") {
--- End diff --
`ResolveHintsSuite` is the smallest one for this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95281/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212249623
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -191,6 +195,39 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
assert(plan2.collect { case p: BroadcastHashJoinExec => p }.size == 1)
}
+ test("SPARK-25121 Supports multi-part names for broadcast hint resolution") {
--- End diff --
Would it be better to move the three tests below into `DataFrameHintSuite`?
- test("broadcast join hint using broadcast function")
- test("broadcast join hint using Dataset.hint")
- test("SPARK-25121 Supports multi-part names for broadcast hint resolution")
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
ping @maryannxue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2569/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2553/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96562/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95448/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #99594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99594/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #97386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97386/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3406/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96431/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch **fails from timeout after a configured wait of `400m`**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212807892
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
--- End diff --
Accordingly, we can use String instead of SessionCatalog.
```scala
- class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, currentDatabase: String) extends Rule[LogicalPlan] {
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95395/testReport)** for PR 22198 at commit [`cc0cd4f`](https://github.com/apache/spark/commit/cc0cd4f4bf90624b796f33dbc6c997032b98b0a0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
@dilipbiswal @gatorsmile ping
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99603/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212807880
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -144,7 +144,7 @@ class Analyzer(
lazy val batches: Seq[Batch] = Seq(
Batch("Hints", fixedPoint,
- new ResolveHints.ResolveBroadcastHints(conf),
+ new ResolveHints.ResolveBroadcastHints(conf, catalog),
--- End diff --
`catalog` is too big for this since we only use the current database name.
```scala
- new ResolveHints.ResolveBroadcastHints(conf, catalog),
+ new ResolveHints.ResolveBroadcastHints(conf, catalog.getCurrentDatabase),
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2567/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96536/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #96536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96536/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95252/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95148/testReport)** for PR 22198 at commit [`d2be692`](https://github.com/apache/spark/commit/d2be6920ba1cc052e9d5d8364cf48375cea8ba44).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5660/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #99603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99603/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #99603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99603/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96505/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96562/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95279/testReport)** for PR 22198 at commit [`bd3f502`](https://github.com/apache/spark/commit/bd3f502b573d0e414cece623d7f4bcce52385043).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95284/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
@maropu and @dilipbiswal and @gatorsmile .
The complexity comes because this PR duplicates the existing name resolution logic. Although we may move `matchedTableIdentifier` to `SessionCatalog`, it seems that we had better clarify the purpose of this PR again.
From your PR description,
> Currently, spark ignores a database name in multi-part names;
Originally, `ResolveBroadcastHints` is designed to be executed at the first batch before `ResolveRelation`. So, it only compare table names. `/*+ MAPJOIN(t) */` will broadcast `testDb.t` and `testDb2.t`. We will keep this behavior, won't we?
For `/*+ MAPJOIN(testDb.t) */`,
1. What we want at the beginning it only supporting matching `testDb.t` to `testDb.t` .
2. After @dilipbiswal 's [comment](https://github.com/apache/spark/pull/22198#issuecomment-415950379), this PR aims to a real resolution by `/*+ MAPJOIN(testDb.t) */` to `t`. `t` depends on the session (`currentDatabase` and temporary views and global temporary views.)
So, until now, we are moving forward to (2), but is (2) really required for SPARK-25121? If we choose (1), it will become simpler and consistent with the original design choice (matching based on unresolved strings).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3426/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22198
cc @maryannxue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
Thanks for understanding, @maropu . Yes. We need to build consensus.
@gatorsmile and @cloud-fan . Could you give us a directional advice for this PR? Basically, we are wondering if we need to provide the same name resolution at this Hint layers. Please see for [the summary comment](https://github.com/apache/spark/pull/22198#issuecomment-416667343).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2482/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22198
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22198#discussion_r212861282
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala ---
@@ -47,20 +49,39 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
- class ResolveBroadcastHints(conf: SQLConf) extends Rule[LogicalPlan] {
+ class ResolveBroadcastHints(conf: SQLConf, catalog: SessionCatalog) extends Rule[LogicalPlan] {
private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN")
def resolver: Resolver = conf.resolver
- private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = {
+ private def namePartsWithDatabase(nameParts: Seq[String]): Seq[String] = {
+ if (nameParts.size == 1) {
+ catalog.getCurrentDatabase +: nameParts
+ } else {
+ nameParts
+ }
+ }
+
+ private def matchedTableIdentifier(
+ nameParts: Seq[String],
+ tableIdent: IdentifierWithDatabase): Boolean = {
+ val identifierList =
+ tableIdent.database.getOrElse(catalog.getCurrentDatabase) :: tableIdent.identifier :: Nil
+ namePartsWithDatabase(nameParts).corresponds(identifierList)(resolver)
--- End diff --
BTW, @maropu . In addition,
- The current behavior of `master` branch (Spark 2.4) is `Apply a hint into both`.
- The legacy behavior of Spark 2.3.1 is raising an exception for that query.
So, I think it's a good change to become consistent in Spark 2.4.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95395/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
Ur, @maropu . What I worried was case (1). For `testDb.t`, there is no problem if this PR matches `testDb.t` literally. However, this PR tries to resolve every relation `t`; this could be temporary view, view, table. That's the reason this PR makes logic duplication. Previously, this layer (Hint resolution / handle / cleanup) doesn't aim that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96536/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3348/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22198
**[Test build #95395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95395/testReport)** for PR 22198 at commit [`cc0cd4f`](https://github.com/apache/spark/commit/cc0cd4f4bf90624b796f33dbc6c997032b98b0a0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org