You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/07/11 08:34:56 UTC
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/14132
[SPARK-16475][SQL] Broadcast Hint for SQL Queries
## What changes were proposed in this pull request?
Broadcast hint is a way for users to manually annotate a query and suggest to the query optimizer the join method. It is very useful when the query optimizer cannot make optimal decision with respect to join methods due to conservativeness or the lack of proper statistics. The DataFrame API has broadcast hint since Spark 1.5. However, we do not have an equivalent functionality in SQL queries. We propose adding Hive-style broadcast hint to Spark SQL. For more information, please see the [design document](https://issues.apache.org/jira/secure/attachment/12817061/BroadcastHintinSparkSQL.pdf).
- [x] SUPPORT `MAPJOIN` SYNTAX
- [ ] SUPPORT `BROADCAST JOIN` SYNTAX
## How was this patch tested?
Pass the Jenkins tests with new testcases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-16475
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14132.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14132
----
commit 6bd704e5d281bd8c7a1bddaee300688d016bc597
Author: Dongjoon Hyun <do...@apache.org>
Date: 2016-07-11T08:04:51Z
[SPARK-16475][SQL] Broadcast Hint for SQL Queries
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71625598
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ val broadcastHint = project match {
+ case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+ if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
+ case _ => ""
+ }
build(
"SELECT",
+ broadcastHint,
--- End diff --
Okay. I'll try that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62526/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62435/consoleFull)** for PR 14132 at commit [`959dab1`](https://github.com/apache/spark/commit/959dab11d792207768216b0b92cb38979579d2d5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
Thanks for your work! Will try to review it again when the comments are addressed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71433270
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
assertEqual("select a, b from db.c where x !> 1",
table("db", "c").where('x <= 1).select('a, 'b))
}
+
+ test("select hint syntax") {
+ // Hive compatibility: Missing parameter raises ParseException.
+ val m = intercept[ParseException] {
+ parsePlan("SELECT /*+ HINT() */ * FROM t")
+ }.getMessage
+ assert(m.contains("no viable alternative at input"))
+
+ // Hive compatibility: No database.
+ val m2 = intercept[ParseException] {
+ parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+ }.getMessage
+ assert(m2.contains("no viable alternative at input"))
+
+ comparePlans(
+ parsePlan("SELECT /*+ HINT */ * FROM t"),
+ Hint("HINT", Seq.empty, table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ HINT */ * FROM t"),
+ Hint("HINT", Seq.empty, table("t")).select(star()))
--- End diff --
A duplicate case, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
The difference of JOIN[5] and JOIN[5] is just due to the following.
* JOIN[5]
```
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: a (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int)
```
* JOIN[5]
```
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
value expressions: a (type: int)
```
For Query 3 and 4, `t1` -> `t2` and `a` -> `b`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Oh, thank you, @cloud-fan . For current Hive, it is ignored. The followings have the same plans.
- SELECT * FROM (SELECT * FROM t1) x, t1;
- SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t1;
- SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t2;
- SELECT /*+ MAPJOIN(t2) */ * FROM (SELECT * FROM t1) x, t2;
```sql
hive> set hive.auto.convert.join=false;
hive> EXPLAIN SELECT * FROM (SELECT * FROM t1) x, t1;
Warning: Shuffle Join JOIN[7][tables = [$hdt$_0, $hdt$_1]] in Stage 'Stage-1:MAPRED' is a cross product
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t1;
Warning: Shuffle Join JOIN[5][tables = [x, t1]] in Stage 'Stage-1:MAPRED' is a cross product
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t2;
Warning: Shuffle Join JOIN[5][tables = [x, t2]] in Stage 'Stage-1:MAPRED' is a cross product
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
hive> EXPLAIN SELECT /*+ MAPJOIN(t2) */ * FROM (SELECT * FROM t1) x, t2;
Warning: Shuffle Join JOIN[5][tables = [x, t2]] in Stage 'Stage-1:MAPRED' is a cross product
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
```
In fact, HIVE-3784 (https://issues.apache.org/jira/browse/HIVE-3784) was January 2013.
Here is the git commit https://github.com/apache/hive/commit/3dca21456a34b6db9cf03e755fae8ecdfaef1730 .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
At the implementation side, blocking all subqueries are easiest.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71464346
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ val broadcastHint = project match {
+ case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+ if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
+ case _ => ""
+ }
build(
"SELECT",
+ broadcastHint,
--- End diff --
"SELECT" occurs in the followings. But, I didn't added meaning-logic based on the testcases.
- aggregateToSQL
- generateToSQL => This has only "SELECT 1"
- groupingSetToSQL
- projectToSQL
- windowToSQL
I think the test coverage enough due to it generates the above cases. If you suggests more testcases, I welcome. I like robustness both for this PR and for the future.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Sure, @gatorsmile . But, could you give more specific examples what you mean? Currently,
- PlanParserSuite has a exception testcase.
- BroadcastJoinSuite matches all nodes of the plan. It means it tests the unmatched table, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71044308
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
--- End diff --
We need more test cases to verify the behaviors.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
When we turn on the option `true`, `t1` is broadcasted.
```
hive> set hive.auto.convert.join=true;
hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (select * from t1) x, t2;
Warning: Map Join MAPJOIN[8][bigTable=t2] in task 'Stage-3:MAPRED' is a cross product
OK
hive> EXPLAIN SELECT /*+ MAPJOIN(t2) */ * FROM (select * from t1) x, t2;
Warning: Map Join MAPJOIN[8][bigTable=t2] in task 'Stage-3:MAPRED' is a cross product
OK
hive> EXPLAIN SELECT * FROM (select * from t1) x, t2;
Warning: Map Join MAPJOIN[10][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
OK
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70227049
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
@@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
cases.foreach(assertBroadcastJoin)
}
}
+
+ test("select hint") {
--- End diff --
Move this to package `org.apache.spark.sql.catalyst.parser.PlanParserSuite`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
To help you find the JIRAs, you can see the umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-11012
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71099153
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
+ val tid = if (names.length > 1) {
+ TableIdentifier(names(1), Some(names(0)))
+ } else {
+ TableIdentifier(param, None)
+ }
+ try {
+ catalog.lookupRelation(tid)
+
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
+ case r @ BroadcastHint(SubqueryAlias(t, _))
+ if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ r
+ case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ BroadcastHint(r)
--- End diff --
When `BroadcastHint` is crossing `SubqueryAlias`, we need to remove the qualifier.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Sure, @gatorsmile !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71099090
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+ case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
--- End diff --
When you moving `Hint` and crossing a `SubqueryAlias`, we should consider adding the qualifier back.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Oh, I see. No problem. Thank you for guide. Then, could you update the JIRA too? I added this for the following comments.
> For more information, please see the attached document. One note about the doc: in addition to supporting "MAPJOIN", we should also support "BROADCASTJOIN".
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62525/consoleFull)** for PR 14132 at commit [`3e1fcf6`](https://github.com/apache/spark/commit/3e1fcf6495e144141970ac207a44293ddd0657bf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70889856
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --
Sorry for the wrong query (I didn't test it). My point is what will happen when we encounter multiple query hints? How are they processed?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72010521
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --
Hi, @cloud-fan . Thank you again.
Yes. That's true. So, at my first commit of this PR, I actually implemented to attach just the concrete `BroadcastHint` to the relations in `AstBuilder.scala`. But, there were some advices to
- Move out the logic from the parser module (for preventing future updates on parser module)
- Generalize the hint syntax like the current code.
Since a general `Hint` is attached on a logical plan not a table relation, we can substitute any other plans (filter/project/aggregator), too. The following is my previous comment about this situation. The child of hint is `JOIN`.
> Yep. I grasp your point. First, the given relation is just a logical plan, not a table. In the following case, the relation is Join.
> SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62832/consoleFull)** for PR 14132 at commit [`d3df3fb`](https://github.com/apache/spark/commit/d3df3fbb61cfe7e573d82d777819220f01b40456).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71457670
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+
+ // Other UnaryNodes are bypassed.
+ case u: UnaryNode
+ if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --
uh, yeah! please add two more spaces before `if`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70934669
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
--- End diff --
In Hive/Oracle, what happens if we are unable to find any `UnresolvedRelation` with the same name?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you always, @hvanhovell . Annd, sorry for the delay. Since last Saturday, I need to do some important personal stuff offline, so the time is limited for me. :(
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62198/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62198/consoleFull)** for PR 14132 at commit [`61e4b1b`](https://github.com/apache/spark/commit/61e4b1b7af15193f6f44ccba848fadd8488650b6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62435/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70886975
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
+BRACKETED_EMPTY_COMMENT
--- End diff --
Ok nvm....
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71424857
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.test.SQLTestUtils
+
+class BroadcastHintSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
+ test("broadcast hint on Hive table") {
+ withTable("hive_t", "hive_u") {
+ spark.sql("CREATE TABLE hive_t(a int)")
+ spark.sql("CREATE TABLE hive_u(b int)")
+
+ val hive_t = spark.table("hive_t").queryExecution.analyzed
+ val hive_u = spark.table("hive_u").queryExecution.analyzed
+
+ val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, hive_u")
+ .queryExecution.analyzed
+ assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
--- End diff --
This is fragile. You can use `collectFirst` instead. Also check the content of `BroadcastHint`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71097752
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
How about Oracle?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71098821
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
--- End diff --
uh, your solution depends the rule `AddSubquery`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71043863
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
--- End diff --
@gatorsmile . Some cases come to my mind. Let me check and fix this.
Thank you!!!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62394/consoleFull)** for PR 14132 at commit [`f318185`](https://github.com/apache/spark/commit/f3181852bfacfff0dd95593e3c9b943a1f261336).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71038438
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
@@ -377,4 +377,64 @@ class AnalysisSuite extends AnalysisTest {
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
}
+
+ test("Substitute hint") {
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("TaBlE"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
+ BroadcastHint(testRelation),
+ caseSensitive = false)
+
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("table"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
+ BroadcastHint(testRelation),
+ caseSensitive = false)
+
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("TaBlE"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
+ BroadcastHint(testRelation))
+
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("table"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
+ testRelation)
+
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("TaBlE"),
+ Project(Seq(UnresolvedAttribute("a")),
+ UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL")))),
+ Project(testRelation.output, BroadcastHint(testRelation)))
+
+ checkAnalysis(
+ Hint("MAPJOIN", Seq("TaBlE"),
+ Project(Seq(UnresolvedAttribute("t.a")),
+ Join(
+ UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL")).as("t"),
+ UnresolvedRelation(TableIdentifier("TaBlE2"), Some("TbL")).as("u"),
+ Inner,
+ None
+ ))),
+ Project(testRelation.output,
+ Join(
+ BroadcastHint(testRelation),
+ testRelation2,
+ Inner,
+ None
+ )))
+
+ checkAnalysis(
--- End diff --
add a test case with nested hints?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62762/consoleFull)** for PR 14132 at commit [`3fa276d`](https://github.com/apache/spark/commit/3fa276d20d9bef56f3bc25e3a6f9d333cfaefdaf).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71097984
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
Yep. You can try to another PR. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71627448
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
--- End diff --
Can you post your code when you merge them in a single case?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71683008
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ val broadcastHint = project match {
+ case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+ if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
+ case _ => ""
+ }
build(
"SELECT",
+ broadcastHint,
--- End diff --
Hmm. For this one, I found more reasonable solution. Instead of adding new dummy `Project`, we can utilize the existing dummy `Project` on the `SQLTable`.
Basically, the root cause of this problem of `groupingSetToSQL` is that it removes the project by doing this `toSQL(project.child)`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70937097
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
- | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+ | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
--- End diff --
How about this `SELECT`? Should we also allow users use `Hint` here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71098009
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -87,6 +87,7 @@ class Analyzer(
EliminateUnions),
Batch("Resolution", fixedPoint,
ResolveRelations ::
+ SubstituteHints ::
--- End diff --
Have you tried to move it after the batch `Resolution`? Any error you hit?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Does this work in `DataFrame` API, too?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71227811
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
@@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
)
}
+ test("broadcast hint with window") {
+ checkGeneratedSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
+ |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+ """.stripMargin,
+ """
+ |SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER (PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`
+ |FROM (SELECT `gen_attr`, `gen_attr`
+ | FROM (SELECT gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
+ | max(`gen_attr`) OVER (PARTITION BY `gen_attr` ORDER BY `gen_attr` ASC
+ | RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `gen_attr`
+ | FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, `gen_attr`,
+ | (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`
+ | FROM ((SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0)
+ | AS x
+ | INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_1
+ | ON (`gen_attr` = `gen_attr`))
+ | AS gen_subquery_2)
+ | AS gen_subquery_3)
+ | AS x
+ """.stripMargin
+ )
+ }
+
+ test("broadcast hint with rollup") {
+ checkGeneratedSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | count(*) as cnt, key%5, grouping_id()
+ |FROM parquet_t1
+ |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
+ """
+ |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS BIGINT))`, `gen_attr` AS `grouping_id()`
+ |FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
+ | (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
+ | FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0
+ | GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
+ | GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
+ | AS gen_subquery_1
+ """.stripMargin)
+
+ }
+
+ test("broadcast hint with grouping sets") {
+ checkGeneratedSQL(
+ s"""
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | count(*) AS cnt, key % 5 AS k1, key - 5 AS k2,grouping_id() AS k3
+ |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY key % 5, key - 5
+ |GROUPING SETS (key % 5, key - 5)
+ """.stripMargin,
+ """
+ |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS `k2`, `gen_attr` AS `k3`
+ |FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`,
+ | (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
+ | FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
+ | (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
+ | (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
+ | FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0)
+ | AS t
+ | GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - CAST(5 AS BIGINT))
+ | GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ((`gen_attr` - CAST(5 AS BIGINT)))))
+ | AS gen_subquery_1
+ """.stripMargin)
+ }
+
+
// This will be removed in favor of SPARK-16590.
private def checkGeneratedSQL(sqlString: String, expectedString: String): Unit = {
- val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
+ val df = sql(sqlString)
+ logError("\n" + df.queryExecution.analyzed.treeString)
--- End diff --
After https://github.com/apache/spark/pull/14235, I need to change the testcase style, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62491/consoleFull)** for PR 14132 at commit [`e6a44ed`](https://github.com/apache/spark/commit/e6a44ed3d97692f033ea4b1832b08527e6157aa2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:
https://github.com/apache/spark/pull/14132
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71098966
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
--- End diff --
You do not need to enumerate all the types. This depends on the tree structure. `UnaryNode`, `BinaryNode` and so on.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70938251
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+
+ // Remove unrecognized hint
--- End diff --
`hint` -> `hints`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71820260
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
case SqlBaseParser.SELECT =>
// Regular select
+ // Add hints.
+ val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --
according to the grammar: `SELECT /*+ hint... */ ...`, it seems that hint should be applied to the whole SELECT clause, not only its relations?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)** for PR 14132 at commit [`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
I'm back. I'll resolve them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71819403
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
;
+hint
+ : '/*+' hintStatement '*/'
+ ;
+
+hintStatement
+ : hintName=identifier
+ | hintName=identifier '(' parameters+=identifier parameters+=identifier ')'
--- End diff --
do we need this rule for broadcast hint?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
OK what I meant was "supporting /*+ BROADCASTJOIN */", not the Impala style ..
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
@gatorsmile let me know when you think this is ready.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71227405
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
@@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
)
}
+ test("broadcast hint with window") {
+ checkGeneratedSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
+ |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+ """.stripMargin,
+ """
+ |SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER (PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`
+ |FROM (SELECT `gen_attr`, `gen_attr`
+ | FROM (SELECT gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
+ | max(`gen_attr`) OVER (PARTITION BY `gen_attr` ORDER BY `gen_attr` ASC
+ | RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `gen_attr`
+ | FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, `gen_attr`,
+ | (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`
+ | FROM ((SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0)
+ | AS x
+ | INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_1
+ | ON (`gen_attr` = `gen_attr`))
+ | AS gen_subquery_2)
+ | AS gen_subquery_3)
+ | AS x
+ """.stripMargin
+ )
+ }
+
+ test("broadcast hint with rollup") {
+ checkGeneratedSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | count(*) as cnt, key%5, grouping_id()
+ |FROM parquet_t1
+ |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
+ """
+ |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS BIGINT))`, `gen_attr` AS `grouping_id()`
+ |FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
+ | (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
+ | FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0
+ | GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
+ | GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
+ | AS gen_subquery_1
+ """.stripMargin)
+
+ }
+
+ test("broadcast hint with grouping sets") {
+ checkGeneratedSQL(
+ s"""
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | count(*) AS cnt, key % 5 AS k1, key - 5 AS k2,grouping_id() AS k3
+ |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY key % 5, key - 5
+ |GROUPING SETS (key % 5, key - 5)
+ """.stripMargin,
+ """
+ |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS `k2`, `gen_attr` AS `k3`
+ |FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`,
+ | (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
+ | FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
+ | (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
+ | (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
+ | FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+ | FROM `default`.`parquet_t1`)
+ | AS gen_subquery_0)
+ | AS t
+ | GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - CAST(5 AS BIGINT))
+ | GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ((`gen_attr` - CAST(5 AS BIGINT)))))
+ | AS gen_subquery_1
+ """.stripMargin)
+ }
+
+
// This will be removed in favor of SPARK-16590.
private def checkGeneratedSQL(sqlString: String, expectedString: String): Unit = {
- val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
+ val df = sql(sqlString)
+ logError("\n" + df.queryExecution.analyzed.treeString)
--- End diff --
Oops. Please ignore this since we will be under review for a while.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70227231
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
;
+hint
+ : '/*+' mapJoinHint '*/'
+ ;
+
+mapJoinHint
+ : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
--- End diff --
Ok - lets keep it this way then.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
In the JIRA https://issues.apache.org/jira/browse/SPARK-11012, we documented the current limitation of SQL generation:
>Note that not all resolved logical query plan can be converted back to SQL query string. Either because it consists of some language structure that has not been supported yet, or it doesn't have a SQL representation inherently (e.g. query plans built on top of local Scala collections).
The purpose of SQL generation is for better native view support. The target logical plan must be parsed from a valid HiveQL query statement. Thus, the two examples you mentioned above are not supported.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62354/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62578/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72037553
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
--- End diff --
Does this really work? `transformDown` is DFS I think
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
For `Table-Level Optimizer Hints` of MySQL, it applied to all table with the same name.
> The hint applies to all tables that it names
http://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html
This is not what we want here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71736274
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
toSQL(p.right),
p.condition.map(" ON " + _.sql).getOrElse(""))
+ case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: SQLTable))) =>
--- End diff --
The new way looks cleaner, but this looks like a special handling. Can you write a code comment to explain when it is being used?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
What I mean is currently how to broadcast the Hive table `tab1`? I'm making the testcase.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Yep. The above example, `spark.range(10).createOrReplaceTempView("t")`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71823645
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
--- End diff --
so why do we have this case? `SQLBuilder` will be applied on analyzed plan right?
And I think it's better to push this check in `CheckAnalysis`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70938037
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -84,7 +84,8 @@ class Analyzer(
Batch("Substitution", fixedPoint,
CTESubstitution,
WindowsSubstitution,
- EliminateUnions),
+ EliminateUnions,
+ SubstituteHint),
--- End diff --
Like `EliminateUnions`, please rename it to `SubstituteHints`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71044871
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && resolver(t.table, table) =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+
+ // Remove unrecognized hints
+ case Hint(name, _, child) =>
+ child
--- End diff --
If they fit one line, the general style rule is to put them in one line
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62435/consoleFull)** for PR 14132 at commit [`959dab1`](https://github.com/apache/spark/commit/959dab11d792207768216b0b92cb38979579d2d5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71038495
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
@@ -377,4 +377,64 @@ class AnalysisSuite extends AnalysisTest {
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
}
+
+ test("Substitute hint") {
--- End diff --
let's create a SubstituteHintsSuite and put these there as multiple test cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)** for PR 14132 at commit [`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70889621
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --
The second `LEFT SEMI JOIN` one should apply the downward nearest one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)** for PR 14132 at commit [`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71040732
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
--- End diff --
This is another one. You could cause multiple nested `BroadcastHint`, if the names are duplicate. : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Yep. I know why you said like that. I also feel like that, but Spark 2.0 uses that term in API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)** for PR 14132 at commit [`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71756794
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
toSQL(p.right),
p.condition.map(" ON " + _.sql).getOrElse(""))
+ case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: SQLTable))) =>
--- End diff --
Sure!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Let me see the example PRs.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70222831
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
+BRACKETED_EMPTY_COMMENT
--- End diff --
Yes. Could you give me some workaround advice?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70425722
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,15 @@ querySpecification
windows?)
;
+hint
+ : '/*+' hintStatement '*/'
+ ;
+
+hintStatement
+ : hintName=identifier identifierList
+ | hintName=identifier '(' parameter1=identifier parameter2=identifier ')'
--- End diff --
What are we trying to support here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70869241
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,39 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ // MAPJOIN
+ case h @ Hint(name, parameters, child) if name.equalsIgnoreCase("MAPJOIN") =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+ // BROADCAST (LEFT|RIGHT)
+ case h @ Hint(name, parameters, j @ Join(left, _, _, _))
+ if name.equalsIgnoreCase("BROADCAST_LEFT") =>
+ j.copy(left = BroadcastHint(left))
+ case h @ Hint(name, parameters, j @ Join(_, right, _, _))
+ if name.equalsIgnoreCase("BROADCAST_RIGHT") =>
+ j.copy(right = BroadcastHint(right))
+
+ // Remove unrecognized hint
+ case Hint(_, _, child) => child
--- End diff --
I think we can add `logDebug` here.
But, usually, you know, DBMS ignores `Hint` by default.
Any malformed or unknown hints become **comment**.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71203247
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
@@ -755,4 +755,243 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
checkHiveQl("select * from orc_t")
}
}
+
+ test("broadcast hint on single table") {
+ checkGeneratedSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
+ """
+ |SELECT `gen_attr` AS `id`
+ |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+ | FROM (SELECT `id` AS `gen_attr`
+ | FROM `default`.`parquet_t0`)
+ | AS gen_subquery_0)
+ | AS parquet_t0
+ |""".stripMargin)
+
+ checkGeneratedSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
+ """
+ |SELECT `gen_attr` AS `id`
+ |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+ | FROM (SELECT `id` AS `gen_attr`
+ | FROM `default`.`parquet_t0`)
+ | AS gen_subquery_0)
+ | AS parquet_t0
+ |""".stripMargin)
+
+ checkGeneratedSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
+ """
+ |SELECT `gen_attr` AS `id`
+ |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+ | FROM ((SELECT `id` AS `gen_attr`
+ | FROM `default`.`parquet_t0`)
+ | AS gen_subquery_0)
+ | AS a)
+ | AS a
+ """.stripMargin)
+
+ checkGeneratedSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t0) */ *
--- End diff --
It's beyond the scope of this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70876188
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
* separated) relations here, these get converted into a single plan by condition-less inner join.
*/
override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
- val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
+ val relationPlan = ctx.relation.asScala.map {
+ case jrc: JoinRelationContext if jrc.broadcast != null =>
+ if (jrc.broadcast.left == null) { // default
+ Hint("BROADCAST_RIGHT", Seq.empty[String], plan(jrc))
--- End diff --
I'm happy with this kind of argument. I agree with you. I prefer to eliminate at the first line. :)
BTW, fortunately, this logic is removed now. We can proceed without disagreement on here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
For old hives, I can get the binary after Hive 1.0.
http://www-us.apache.org/dist/hive/
The above result was Hive 2.0 and 1.2. I think Hive 1.0 and 1.1 will be the same as expected.
The legacy code was January 2013.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70930995
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+
+ // Remove unrecognized hint
+ case Hint(name, _, child) =>
+ logDebug(s"Ignore Unknown Hint: $name")
--- End diff --
Sure. No problem. It was just added by @hvanhovell 's request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71825810
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
case SqlBaseParser.SELECT =>
// Regular select
+ // Add hints.
+ val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --
Yep. I grasp your point. First, the given `relation` is just a logical plan, not a table. In the following case, the `relation` is `Join`.
```
SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1
```
This means just a context.
Second, I agree with your point. Generally, `hint` is about tables like join order or access path. We can provides some hints on filter conditions, I'm not sure about the cases so far. But, even for that cases, we can adapt that in `Analyzer` easily. In other words, we can identify `Hint("FILTERHINT")` in the same SPJ.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
So far, I couldn't do the following two advices. I just inform you that I'm still working these. :)
- `HINT_PREFIX`
- `| hintName=identifier '(' parameters+=identifier parameters+=identifier ')'`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70938777
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -84,7 +84,8 @@ class Analyzer(
Batch("Substitution", fixedPoint,
CTESubstitution,
WindowsSubstitution,
- EliminateUnions),
+ EliminateUnions,
+ SubstituteHint),
--- End diff --
Great!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70935879
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
--- End diff --
Yeah, `Analyzer` always consider case sensitivity issues. Please check how to use `resolver`: https://github.com/apache/spark/blob/8b5a8b25b9d29b7d0949d5663c7394b26154a836/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L68-L74
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62403/consoleFull)** for PR 14132 at commit [`bb44615`](https://github.com/apache/spark/commit/bb44615c3480369fc7b9381a955da817975eeb01).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
Really? Can you show me the files? I think we should correct them
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62966/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72036822
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
@@ -372,6 +372,10 @@ trait CheckAnalysis extends PredicateHelper {
|in operator ${operator.simpleString}
""".stripMargin)
+ case Hint(_, _, _) =>
+ throw new IllegalStateException(
+ "logical hint operator should have been removed by the optimizer")
--- End diff --
Oh!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72011582
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --
First of all, I'll take a look again. I need some boot-up time. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71446840
--- Diff: sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
@@ -0,0 +1,7 @@
+-- This file is automatically generated by LogicalPlanToSQLSuite.
+SELECT /*+ MAPJOIN(parquet_t0) */ *
+FROM (SELECT id tid FROM parquet_t0) T
+JOIN (SELECT key uid FROM parquet_t1) U
+ON tid=uid
+--------------------------------------------------------------------------------
+SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, `gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
--- End diff --
Yep. I'm trying to recover them in https://github.com/apache/spark/pull/14257 , too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62491/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62675/consoleFull)** for PR 14132 at commit [`ad10ed6`](https://github.com/apache/spark/commit/ad10ed6144806574e49274119f22c0d5021a47a2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70931050
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
--- End diff --
Oh, I missed that. Sure.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for the great work by @dongjoon-hyun !
Although the code changes are a lot, but most are test cases. The code is ready to review! @rxin @hvanhovell
Also cc @cloud-fan @liancheng You might have a better idea in SQL generation. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71823152
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
--- End diff --
Also, `SparkStrategies` ensures that here https://github.com/apache/spark/pull/14132/files#diff-7253a38df7e111ecf6b1ef71feba383bR347 , too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71259834
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -87,6 +87,7 @@ class Analyzer(
EliminateUnions),
Batch("Resolution", fixedPoint,
ResolveRelations ::
+ SubstituteHints ::
--- End diff --
In each batch, the order of rules should not matter. That means, the rule `SubstituteHints` impacts the other rules. Please fix it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71823613
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
case SqlBaseParser.SELECT =>
// Regular select
+ // Add hints.
+ val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --
for now the hint can only affect the relations(broadcast hint), so it's ok to just put the `Hint` on top of relation. But generally hint can affect anything in the select clause, e.g. maybe the filter condition.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71098026
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
I intentionally remove that from this PR. You cannot get all in a single PR.
I think you know Oracle Global Hint (you already asked to support view_name.table_name.), you can do better than me at that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71430308
--- Diff: sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
@@ -0,0 +1,7 @@
+-- This file is automatically generated by LogicalPlanToSQLSuite.
+SELECT /*+ MAPJOIN(parquet_t0) */ *
+FROM (SELECT id tid FROM parquet_t0) T
+JOIN (SELECT key uid FROM parquet_t1) U
+ON tid=uid
+--------------------------------------------------------------------------------
+SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, `gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
--- End diff --
oh, I see. The compare removes the id.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)** for PR 14132 at commit [`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70939090
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+
+ // Remove unrecognized hint
--- End diff --
Yep.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
I do not know the exact context why you need SQL generation. Tomorrow, I will review your PR (https://github.com/apache/spark/pull/14116) to understand your issues tomorrow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71626761
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+
+ // Other UnaryNodes are bypassed.
+ case u: UnaryNode
+ if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
+ val hint = u.child.asInstanceOf[Hint]
+ hint.copy(child = u.withNewChildren(Seq(hint.child)))
+
+ // Other binary or higher operations are ignored.
--- End diff --
We still need to explain why they can be ignored.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for all comments, @hvanhovell and @rxin . I'll update soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62714/consoleFull)** for PR 14132 at commit [`2ee7d21`](https://github.com/apache/spark/commit/2ee7d21889490f05b591a276fa33df96693caf05).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62347/consoleFull)** for PR 14132 at commit [`142f5ef`](https://github.com/apache/spark/commit/142f5eff7f3f30fea6b5d48f918ffcf478ddccd6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62832/consoleFull)** for PR 14132 at commit [`d3df3fb`](https://github.com/apache/spark/commit/d3df3fbb61cfe7e573d82d777819220f01b40456).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62198/consoleFull)** for PR 14132 at commit [`61e4b1b`](https://github.com/apache/spark/commit/61e4b1b7af15193f6f44ccba848fadd8488650b6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
@dongjoon-hyun this will break SQL view generation. Can you update SQLBuilder and LogicalPlanToSQLSuite ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71261455
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
+ val tid = if (names.length > 1) {
+ TableIdentifier(names(1), Some(names(0)))
+ } else {
+ TableIdentifier(param, None)
+ }
+ try {
+ catalog.lookupRelation(tid)
+
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
+ case r @ BroadcastHint(SubqueryAlias(t, _))
+ if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ r
+ case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ BroadcastHint(r)
--- End diff --
Sure, If you want to remove this, I can simply this. It's a little bit legacy I tried to follow your advice as much as possible. (As I mentioned before, I decide to block in Parser layer after I found that Hive does.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70935292
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
--- End diff --
Hint is ignored.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hive and Spark will raise `ParseException`. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62577/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71443387
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
assertEqual("select a, b from db.c where x !> 1",
table("db", "c").where('x <= 1).select('a, 'b))
}
+
+ test("select hint syntax") {
+ // Hive compatibility: Missing parameter raises ParseException.
+ val m = intercept[ParseException] {
+ parsePlan("SELECT /*+ HINT() */ * FROM t")
+ }.getMessage
+ assert(m.contains("no viable alternative at input"))
+
+ // Hive compatibility: No database.
+ val m2 = intercept[ParseException] {
+ parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+ }.getMessage
+ assert(m2.contains("no viable alternative at input"))
+
+ comparePlans(
+ parsePlan("SELECT /*+ HINT */ * FROM t"),
+ Hint("HINT", Seq.empty, table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ HINT */ * FROM t"),
+ Hint("HINT", Seq.empty, table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ BROADCASTJOIN(u) */ * FROM t"),
+ Hint("BROADCASTJOIN", Seq("u"), table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ MAPJOIN(u) */ * FROM t"),
+ Hint("MAPJOIN", Seq("u"), table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ STREAMTABLE(a,b,c) */ * FROM t"),
+ Hint("STREAMTABLE", Seq("a", "b", "c"), table("t")).select(star()))
+
+ comparePlans(
+ parsePlan("SELECT /*+ INDEX(t emp_job_ix) */ * FROM t"),
+ Hint("INDEX", Seq("t", "emp_job_ix"), table("t")).select(star()))
+ }
--- End diff --
Yep. I added like this.
```
comparePlans(
parsePlan("SELECT /*+ MAPJOIN(`default.t`) */ * from `default.t`"),
Hint("MAPJOIN", Seq("default.t"), table("default.t")).select(star()))
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71463733
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ val broadcastHint = project match {
+ case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+ if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
+ case _ => ""
+ }
build(
"SELECT",
+ broadcastHint,
--- End diff --
It's the result of Window test query. For the Windows query, there were nested Projects.
```
test("broadcast hint with window") {
checkSQL(
"""
|SELECT /*+ MAPJOIN(parquet_t1) */
| x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
""".stripMargin,
"broadcast_hint_window")
}
```
I had the same feeling why some "SELECT" doesn't happen.
After https://issues.apache.org/jira/browse/SPARK-16576 , I think this kind of weirdness will be reduced.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62403/consoleFull)** for PR 14132 at commit [`bb44615`](https://github.com/apache/spark/commit/bb44615c3480369fc7b9381a955da817975eeb01).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71425663
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.test.SQLTestUtils
+
+class BroadcastHintSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
+ test("broadcast hint on Hive table") {
+ withTable("hive_t", "hive_u") {
+ spark.sql("CREATE TABLE hive_t(a int)")
+ spark.sql("CREATE TABLE hive_u(b int)")
+
+ val hive_t = spark.table("hive_t").queryExecution.analyzed
+ val hive_u = spark.table("hive_u").queryExecution.analyzed
+
+ val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, hive_u")
+ .queryExecution.analyzed
+ assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
--- End diff --
Sure! No problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71259501
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
--- End diff --
If you read what we did in SQLBuilder, you might know that is not the normal way we did. @rxin gave the same comment below: https://github.com/apache/spark/pull/14132#issuecomment-233503432.
Keeping a white list is hard to maintain. You know, I still can find more missing cases here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71626636
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
+ toSQL(child)
+
+ case BroadcastHint(child) =>
+ toSQL(child)
--- End diff --
Yep. Right.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
By the way, the view issue is moved into [SPARK-16492](https://issues.apache.org/jira/browse/SPARK-16492).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62975/consoleFull)** for PR 14132 at commit [`4023d97`](https://github.com/apache/spark/commit/4023d974f34052bb29e12fd93aeb187ea12b536f).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70874930
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
* separated) relations here, these get converted into a single plan by condition-less inner join.
*/
override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
- val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
+ val relationPlan = ctx.relation.asScala.map {
+ case jrc: JoinRelationContext if jrc.broadcast != null =>
+ if (jrc.broadcast.left == null) { // default
+ Hint("BROADCAST_RIGHT", Seq.empty[String], plan(jrc))
--- End diff --
It does.
However here is my counter argument :)... In this case we had to define the syntax for a hint in the grammar (join will break without it). If we have to do that we are tied to that particular hint anyway, so in those cases I would implement it right away. The added bonus here would be that you simplify your analysis rule by quite a bit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71203038
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
+ val tid = if (names.length > 1) {
+ TableIdentifier(names(1), Some(names(0)))
+ } else {
+ TableIdentifier(param, None)
+ }
+ try {
+ catalog.lookupRelation(tid)
+
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
+ case r @ BroadcastHint(SubqueryAlias(t, _))
+ if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ r
+ case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ BroadcastHint(r)
--- End diff --
Here, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62564/consoleFull)** for PR 14132 at commit [`94aaf6b`](https://github.com/apache/spark/commit/94aaf6b988cfb94c6d14b44a708d75657e75d0b4).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
@dongjoon-hyun Finished the fourth pass. Sorry, it took a long time to review it. So many meetings...
Just a question, what is the output from Hive when users use a fully qualified name in Hint?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62491/consoleFull)** for PR 14132 at commit [`e6a44ed`](https://github.com/apache/spark/commit/e6a44ed3d97692f033ea4b1832b08527e6157aa2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @hvanhovell . So far, I tried in the following way for `HINT_PREFIX`.
Due to the prefix overlapping, `BRACKETED_COMMENT` seems to eat the HINT.
Could you give me some advice more?
```antlr
hint
- : '/*+' hintStatement '*/'
+ : HINT_PREFIX hintStatement '*/'
;
hintStatement
@@ -961,12 +961,12 @@ SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
-BRACKETED_EMPTY_COMMENT
- : '/**/' -> channel(HIDDEN)
+HINT_PREFIX
+ : '/*+'
;
BRACKETED_COMMENT
- : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
+ : '/*' .*? '*/' -> channel(HIDDEN) <--- The original one.
;
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71261042
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+ case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = s.copy(child = hintChild))
+ case ll @ LocalLimit(_, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = ll.copy(child = hintChild))
+ case f @ Filter(_, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = f.copy(child = hintChild))
+ case a @ Aggregate(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = a.copy(child = hintChild))
+ case s @ Sort(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = s.copy(child = hintChild))
+ case g @ Generate(_, _, _, _, _, h @ Hint("BROADCAST", _, hintChild)) =>
+ h.copy(child = g.copy(child = hintChild))
+ // Set operation is not allowed to be across. UNION/INTERCEPT/EXCEPT
--- End diff --
Okay. But, that was the reason not to allowed there. Hmm, maybe it looks different. Sorry.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62196/consoleFull)** for PR 14132 at commit [`5c76e25`](https://github.com/apache/spark/commit/5c76e25851888f0cafa1815a12760846efeb2077).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62347/consoleFull)** for PR 14132 at commit [`142f5ef`](https://github.com/apache/spark/commit/142f5eff7f3f30fea6b5d48f918ffcf478ddccd6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62675/consoleFull)** for PR 14132 at commit [`ad10ed6`](https://github.com/apache/spark/commit/ad10ed6144806574e49274119f22c0d5021a47a2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71097256
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
Currently, the parser does not allow to specify database name. (I didn't commit the code since I found that Hive doesn't, too.) In case of the parser acceptance, it's valid for all the case because the parser will accept only identifiers.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62975/consoleFull)** for PR 14132 at commit [`4023d97`](https://github.com/apache/spark/commit/4023d974f34052bb29e12fd93aeb187ea12b536f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71694743
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
--- End diff --
For the functional equivalence, you can see the logic. Also, the above code is passed `LogicalPlanToSQLSuite` with the existing tests, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71825927
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
--- End diff --
Ur, it's added by previous advice comments.
No problem. I'll move that into `CheckAnalysis`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70223258
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
+BRACKETED_EMPTY_COMMENT
+ : '/**/' -> channel(HIDDEN)
+ ;
+
BRACKETED_COMMENT
- : '/*' .*? '*/' -> channel(HIDDEN)
+ : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
--- End diff --
It might be easier to introduce a HINT_PREFIX rule (`'/*+'`) and place this before the BRACKET_COMMENT rule.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for replying, @gatorsmile . Yes. Right.
In fact, that limitation is important to me. To complete my `INFORMATION_SCHEMA` PR (https://github.com/apache/spark/pull/14116), I need to support view for Spark temporal views some day later. Of course, currently, in that PR, I removed the VIEW DEFINITION column implementation part due to many reason.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Oh, @rxin .
Since the position of `BroadcastHint` is different from the original, we can not figure out the exact SELECT from the multiple SELECT candidates. I'm trying to attach to the nearest SELECT ancestor if possible. I hope that is okay since the functionality is the same.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71202999
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+ case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
--- End diff --
It seems `database.table` issue which I mentioned before.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71041967
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
--- End diff --
Could you give me a little bit more hint? :)
Is the naming `resolvedChild`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70883834
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -509,6 +512,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
}
/**
+ * Add a Hint to a logical plan.
+ */
+ private def withHints(
+ ctx: HintContext,
+ relation: LogicalPlan): LogicalPlan = withOrigin(ctx) {
+ val stmt = ctx.hintStatement
+ val name = stmt.hintName.getText.toUpperCase
+ if (stmt.identifierList != null) {
+ Hint(name, visitIdentifierList(stmt.identifierList), relation)
+ } else if (stmt.parameters != null) {
+ Hint(name, stmt.parameters.asScala.map(_.getText), relation)
+ } else {
+ Hint(name, Seq.empty[String], relation)
--- End diff --
Hmm. Hive gets the following, but mine doesn't.
```
hive> select /*+ MAPJOIN */ * from t;
OK
```
Let me fix this. Thank you so much. You're a human compiler. :) +1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14132
This looks pretty good!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
For the HINT_PREFIX, I tried at the first, but still have some problem. So, I couldn't include the very previous commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71823079
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
--- End diff --
After applying the rules of `Analyzer`, there will be no more `Hint` case class.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
I see. For `NormalizeBroadcastHint`, I will try to minimize the cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/14132
cc @rxin @hvanhovell what do you think of this case?
`SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x JOIN t1`
Applying the hint to the `t1` after JOIN looks more intuitive to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71260947
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
+ val tid = if (names.length > 1) {
+ TableIdentifier(names(1), Some(names(0)))
+ } else {
+ TableIdentifier(param, None)
+ }
+ try {
+ catalog.lookupRelation(tid)
+
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
+ case r @ BroadcastHint(SubqueryAlias(t, _))
+ if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ r
+ case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ BroadcastHint(r)
--- End diff --
If we do not support `db.table`, why we still compare the whole identifier?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14132
@dongjoon-hyun sure. It was merely a suggestion to get rid of the `BRACKETED_EMPTY_COMMENT ` rule.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
Is it related? This is the most basic test case, right?
```SQL
CREATE TABLE tab1(c1 int)
select * from tab1, tab1
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70937541
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
@@ -354,6 +354,14 @@ case class BroadcastHint(child: LogicalPlan) extends UnaryNode {
override lazy val statistics: Statistics = super.statistics.copy(isBroadcastable = true)
}
+/**
+ * A general hint for the child.
+ * a pair of (name, parameters).
--- End diff --
`a` -> `A`. I think we still can disclose more info about this logical node.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for guidance, @cloud-fan !
I'll implement like that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72181762
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
--- End diff --
You probably want to look at older versions of Hive. For example, 0.10. After https://issues.apache.org/jira/browse/HIVE-3784, Hive does not really use map join hint.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70938951
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
--- End diff --
For this one, we have this rule in `Batch("Substitution")`. I think that's fairly enough information.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62162/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
What I mean is not making real view as you concern. I'm telling the one I need to make a view definition by SQL generation is the result of
```
spark.range(10).createOrReplaceTempView("t")
new org.apache.spark.sql.catalyst.SQLBuilder(sql("select * from t")).toSQL
```
I'm not disagree with your opinion here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71041157
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
- | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+ | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
--- End diff --
uh... This grammar file kills my eye... You are right. So far, to support Broadcast Hint, it is good enough.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)** for PR 14132 at commit [`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for giving opinion.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
Not all the joins have the operators `SubqueryAlias`. For example, below is a self join against Hive tables:
```
== Analyzed Logical Plan ==
c1: int, c1: int
Project [c1#7, c1#8]
+- Join Inner
:- MetastoreRelation default, tab1
+- MetastoreRelation default, tab1
```
Thus, the current solution does not work, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71097771
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
Let me try the quoted identifiers in the parser.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71096995
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
This does not work in all the cases, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71463337
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+
+ // Other UnaryNodes are bypassed.
+ case u: UnaryNode
+ if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --
Sure.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
The following is updated.
- Add more descriptions and test cases (about finding closest table and nested hint).
- Support no parameter hint like `/*+ INDEX */`.
- Generalize `hintStatement` rule.
- Simplify `withHints`.
- Move `toUpperCase` into Analyzer rule.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)** for PR 14132 at commit [`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71047081
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
--- End diff --
Yep. I agree with you for that purpose of future use.
The testsuites are not able to guarantee that for all cases.
I will add some description.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
@dongjoon-hyun please only support the Hive style mapjoin hint, and don't add the "LEFT OUTER JOIN BROADCAST" way. It'd mean queries written in Spark 2.1 would not work with earlier versions of Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71465508
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
--- End diff --
Also, in the PR description, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
All the test cases are for positive cases. We also should add more negative cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71626212
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
case OneRowRelation =>
""
+ case Hint(_, _, child) =>
+ toSQL(child)
+
+ case BroadcastHint(child) =>
+ toSQL(child)
--- End diff --
This is dead code. Right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62347/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
There is also related issue you might be interested.
https://issues.apache.org/jira/browse/SPARK-16601
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you for @gatorsmile and @rxin .
I'll remote `logDebug` and add a unittest for Analyzer soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
@dongjoon-hyun it is ok not to show the view sql string if the current framework does not support it (e.g. it is generated by a dataframe without the sql equivalent).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
I see. Sorry for misunderstanding. Let me summary again. The following is the final decision, right?
```
SELECT /*+ MAPJOIN(t) */ * FROM t JOIN u ON t.id = u.id
SELECT /*+ MAPJOIN(u) */ * FROM t JOIN u ON t.id = u.id
SELECT /*+ BROADCASTJOIN(t) */ * FROM t JOIN u ON t.id = u.id
SELECT /*+ BROADCASTJOIN(u) */ * FROM t JOIN u ON t.id = u.id
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70867118
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -374,8 +383,8 @@ setQuantifier
relation
: left=relation
- ((CROSS | joinType) JOIN right=relation joinCriteria?
- | NATURAL joinType JOIN right=relation
+ ((CROSS | joinType) JOIN broadcast=broadcastType? right=relation joinCriteria?
--- End diff --
I am not a giant fan or this syntax. I'd prefer `LEFT BROADCAST JOIN`, but I am aware that this is also clunky: 'LEFT RIGHT BROADCAST JOIN`...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70866390
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
* separated) relations here, these get converted into a single plan by condition-less inner join.
*/
override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
- val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
+ val relationPlan = ctx.relation.asScala.map {
--- End diff --
I don't think this will work with multiple joins. I think we should do this in `visitJoinRelation`...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72011145
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --
then should we wrap the whole query plan with `Hint` instead of just the relation? https://github.com/apache/spark/pull/14132#discussion-diff-71820260R343
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Oh, the one failure is due to a new MAPJOIN testcase in the master branch (I added yesterday.)
I'll rebase and update that, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)** for PR 14132 at commit [`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @cloud-fan .
Now, it wraps the whole query.
This should be done at your first comment.
Sorry for delay.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72176938
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
--- End diff --
yea, for this example, DFS is right and BFS is wrong. But how about the case I mentioned above?
I think we don't need to traverse the plan tree recursively, just find the sub-tree of relation part, flatten the join node if any, and then iterate these relations, wrap it with broadcast hint if it's a table and table name matches the hint argument.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72178111
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
--- End diff --
For your case, what I understand is the DFS should be used. I've search Hive codebase until now, but I couldn't find the clear logic until now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71265020
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
+ val tid = if (names.length > 1) {
+ TableIdentifier(names(1), Some(names(0)))
+ } else {
+ TableIdentifier(param, None)
+ }
+ try {
+ catalog.lookupRelation(tid)
+
+ var stop = false
+ resolvedChild = resolvedChild.transformDown {
+ case r @ BroadcastHint(SubqueryAlias(t, _))
+ if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ r
+ case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
+ stop = true
+ BroadcastHint(r)
--- End diff --
I think we have to remove it; otherwise, the result will be wrong.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62966/consoleFull)** for PR 14132 at commit [`bd13473`](https://github.com/apache/spark/commit/bd13473e662cc51112c372016e91dbce03d848bc).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70948608
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
- | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+ | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
--- End diff --
Do you want the following? It's already handled.
```scala
scala> sql("FROM t JOIN u ON t.id = u.id SELECT /*+ MAPJOIN(u) */ *").explain
== Physical Plan ==
*BroadcastHashJoin [id#0L], [id#4L], Inner, BuildRight
:- *Range (0, 1000000000, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *Range (0, 1000000000, splits=8)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71040527
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
--- End diff --
This is a serious bug, right? : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72040281
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
;
+hint
+ : '/*+' hintStatement '*/'
--- End diff --
I might have missed something, but do we only support one hint?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70223712
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
;
+hint
+ : '/*+' mapJoinHint '*/'
+ ;
+
+mapJoinHint
+ : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
--- End diff --
Is the MAPJOIN optimization the only one we are going to support? This is fine if we aren't. We might want to make this more general (and move the logic into the planner) if we are.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70226307
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
;
+hint
+ : '/*+' mapJoinHint '*/'
+ ;
+
+mapJoinHint
+ : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
--- End diff --
Currently, the scope of this PR does. We can generalize if needed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70225259
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
case SqlBaseParser.SELECT =>
// Regular select
+ // Broadcast hints
+ var withBroadcastedTable = relation
--- End diff --
A few thoughts here:
- Should we introduce a separate LogicalPlan (like `With`) and move this logic into an Analyzer rule? The reason that I am proposing this is because we are transforming the `LogicalPlan` here. It is simpler to do it here though.
- Lets move the logic here into a separate method. Name it `withHints`, model it after the other `with*` functions, and use `optionalMap(hint)(withHint)`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72008134
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,49 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ *
+ * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
+ * rule is applied. Here are two reasons.
+ * - To support `MetastoreRelation` in Hive module.
+ * - To reduce the effect of `Hint` on the other rules.
+ *
+ * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
+ * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --
how does hive handle sql hints? This rule looks a little complex to me, according to the fact that hints can only be applied to table relation. It will be great if we can wrap `UnresolvedRelation` with `Hint` in the parser, and then it will be very easy to resolve hints at analyzer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62806/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71200265
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -87,6 +87,7 @@ class Analyzer(
EliminateUnions),
Batch("Resolution", fixedPoint,
ResolveRelations ::
+ SubstituteHints ::
--- End diff --
Yep. Here is the error when we move it from the batch `Resolution` into the batch `Nondeterministic`.
```
unresolved operator 'Project [*];
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [*];
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
I was referring to NormalizeBroadcastHint -- there are many cases in there and it seems error prone against future changes. Do we need all those rules?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
Is there a better way to handle sql generation that's not adding 10+ rules?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71259759
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,51 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+
+ for (param <- parameters) {
+ val names = param.split("\\.")
--- End diff --
Sorry, I do not think this is beyond of the scope. Instead, this is a bug. This identifier stores a table identifier. If we do not use the same parsing solution, the result will be wrong.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Thank you so much for in-depth interview, @gatorsmile .
I've learned a lot. Yes. It is very late. I took a nap during the evening. :)
See you tomorrow on GitHub.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
For your reference, below is a simple case if users want to do it using dataframe
```Scala
sql("CREATE TABLE tab1(c1 int)")
val df = spark.read.table("tab1")
df.join(broadcast(df))
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71045978
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHints extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && resolver(t.table, table) =>
--- End diff --
What happens when users use `databaseName`.`tableName` and `tableName` at the same time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14132
I agree with @cloud-fan. It is confusing that the hint defined in the top-level query gets applied to a subquery.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
What are Spark temporal views? Temporal tables?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71044638
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
--- End diff --
When adding more and more rules, it becomes harder to track the batch-based dependencies. Then, we have to split the batches. Thus, I think we should do our best to document the prerequisite, dependency assumptions in each rule
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62357/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @rxin and @hvanhovell .
Could you review this `HINT` PR when you have some time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Oh, for the worst case, I hope to skip all. :) I'll investigate them too. Thank you for the pointers!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71464803
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
--- End diff --
Oh, I missed you comment here. It's too far from the bottom now. :)
I'll add more `prerequisite, dependency assumptions` here now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r72040705
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
;
+hint
+ : '/*+' hintStatement '*/'
--- End diff --
Yes. Currently, it does. Should we expand this, too?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @hvanhovell .
In this PR, if possible, could we skip adding `HINT_PREFIX` rule? I finished all the other things.
Actually, there is no change on the behavior of parser if we can use `HINT_PREFIX`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14132
I see. That is a temporary table. Temporal Tables are a term for different purposes.
The sizing is not small, if we want to support the temporary views created by any DataFrame/Dataset.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62500/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @rxin and @gatorsmile .
Now, it's ready for review again.
- `SubstituteHints` is minimized.
- `MetastoreRelation` is supported.
- This PR originally handles all the tables because it handle `UnresolvedRelation`. While trying to support `database` name, there occurs regression as @gatorsmile mentioned. Now, it's rollbacked. Hive `MetastoreRelation` is not in `catalyst` package. We should do this for `UnresolvedRelation`.
- All the existing testcases passed. And, Hive specific testsuite is added.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71425497
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
@@ -933,4 +933,124 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
checkSQL("select * from orc_t", "select_orc_table")
}
}
+
+ test("broadcast hint on single table") {
+ checkSQL("SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
+ "broadcast_hint_single_table_1")
+
+ checkSQL("SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
+ "broadcast_hint_single_table_2")
+
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
+ "broadcast_hint_single_table_3")
+
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t0) */ *
+ |FROM (SELECT id tid FROM parquet_t0) T
+ |JOIN (SELECT id uid FROM parquet_t0) U
+ |ON tid=uid
+ """.stripMargin,
+ "broadcast_hint_single_4")
+ }
+
+ test("broadcast hint on multiple tables") {
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1",
+ "broadcast_hint_multiple_table_1")
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t1) */ * FROM parquet_t0, parquet_t1",
+ "broadcast_hint_multiple_table_2")
+ }
+
+ test("broadcast hint on nested query 1") {
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t0) */ *
+ |FROM (SELECT id tid FROM parquet_t0) T
+ |JOIN (SELECT key uid FROM parquet_t1) U
+ |ON tid=uid
+ """.stripMargin,
+ "broadcast_hint_nested_query_1")
+ }
+
+ test("broadcast hint on nested query 2") {
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */ tid
+ |FROM (SELECT id tid FROM parquet_t0) T
+ |JOIN (SELECT key uid FROM parquet_t1) U
+ |ON tid=uid
+ """.stripMargin,
+ "broadcast_hint_nested_query_2")
+ }
+
+ test("multiple broadcast hints on multiple tables") {
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0, parquet_t1) */ * FROM parquet_t0, parquet_t1",
+ "multiple_broadcast_hints")
+ }
+
+ test("broadcast hint with filter") {
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10",
+ "broadcast_hint_with_filter")
+ }
+
+ test("broadcast hint with filter/limit") {
+ checkSQL(
+ "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10 LIMIT 10",
+ "broadcast_hint_with_filter_limit")
+ }
+
+ test("broadcast hint with generator") {
+ checkSQL(
+ "SELECT * FROM (SELECT /*+ MAPJOIN(parquet_t0) */ EXPLODE(ARRAY(1,2,3)) FROM parquet_t0) T",
+ "broadcast_hint_generator")
+ }
+
+ test("broadcast hint with groupby/having/orderby") {
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t0) */ *
+ |FROM parquet_t0
+ |WHERE id > 0
+ |GROUP BY id
+ |HAVING count(*) > 0
+ |ORDER BY id
+ """.stripMargin,
+ "broadcast_hint_groupby_having_orderby")
+ }
+
+ test("broadcast hint with window") {
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
+ |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+ """.stripMargin,
+ "broadcast_hint_window")
+ }
+
+ test("broadcast hint with rollup") {
+ checkSQL(
+ """
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ | count(*) as cnt, key%5, grouping_id()
+ |FROM parquet_t1
+ |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
--- End diff --
format
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62578/consoleFull)** for PR 14132 at commit [`9021975`](https://github.com/apache/spark/commit/9021975a2243153edbfa1d4f760f8fcade760513).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Right, there is `table` API, too. Thank you, I'll add that, too.
By the way, I still in the downtown. I need to go home for dinner. I'll take care that tonight.
Thank you again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62762/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/14132
for case 2, what does hive(or other databases that support sql hint) do?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)** for PR 14132 at commit [`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70938003
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,35 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case logical: LogicalPlan => logical transformDown {
+ case h @ Hint(name, parameters, child)
+ if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
+ var resolvedChild = child
+ for (table <- parameters) {
+ var stop = false
+ resolvedChild = child.transformDown {
+ case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+ case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
+ stop = true
+ BroadcastHint(r)
+ }
+ }
+ resolvedChild
+
+ // Remove unrecognized hint
+ case Hint(name, _, child) =>
+ logDebug(s"Ignore Unknown Hint: $name")
--- End diff --
Sounds good!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14132
Hi, @rxin .
Could you review this PR again when you have some time?
All the features are implemented and Spark Parser accepts general hints now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70882759
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
@@ -509,6 +512,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
}
/**
+ * Add a Hint to a logical plan.
+ */
+ private def withHints(
+ ctx: HintContext,
+ relation: LogicalPlan): LogicalPlan = withOrigin(ctx) {
+ val stmt = ctx.hintStatement
+ val name = stmt.hintName.getText.toUpperCase
--- End diff --
No problem. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14132
Yup - and "broadcast".
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62162/consoleFull)** for PR 14132 at commit [`edec2e4`](https://github.com/apache/spark/commit/edec2e4ff0648c73f2efc743edca80a705565406).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class Hint(name: String, parameters: Seq[String], child: LogicalPlan) extends UnaryNode `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14132
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71822903
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
;
+hint
+ : '/*+' hintStatement '*/'
+ ;
+
+hintStatement
+ : hintName=identifier
+ | hintName=identifier '(' parameters+=identifier parameters+=identifier ')'
--- End diff --
Thank you for review, @cloud-fan .
The first goal of this PR provides a general syntax for hints, not only broadcast hints. The `(` and `)` syntax is for `INDEX(t idx_emp)` style. You can see the testcase for this in the testcase, too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14132
**[Test build #62966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62966/consoleFull)** for PR 14132 at commit [`bd13473`](https://github.com/apache/spark/commit/bd13473e662cc51112c372016e91dbce03d848bc).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r71444098
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
}
}
+ /**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
+ * information of BroadcastHint resides its current position inside the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
+ */
+ object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+ // Capture the broadcasted information and store it in Hint.
+ case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
+ Hint("BROADCAST", Seq(table), child)
+
+ // Nearest Project is found.
+ case p @ Project(_, Hint(_, _, _)) => p
+
+ // Merge BROADCAST hints up to the nearest Project.
+ case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
+ h.copy(parameters = params1 ++ params2)
+ case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
+ h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
+
+ // Bubble up BROADCAST hints to the nearest Project.
+ case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+ h.copy(child = j.copy(left = hintChild))
+ case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+ h.copy(child = j.copy(right = hintChild))
+
+ // Other UnaryNodes are bypassed.
+ case u: UnaryNode
+ if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
+ val hint = u.child.asInstanceOf[Hint]
+ hint.copy(child = u.withNewChildren(Seq(hint.child)))
+
+ // Other binary operations are ignored.
--- End diff --
Yep. I worried you say like that. :) I hesitated to use `higher` or `multinary`.
May I enumerate all of them? At first, I did but remove that comment since it will be outdated.
Let me think about this more.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70892189
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1774,6 +1775,34 @@ class Analyzer(
}
/**
+ * Substitute Hints.
+ */
+ object SubstituteHint extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --
Nested one works like the design. I'm making testcase to ensure that.
Here is a example for your query.
```scala
spark.range(10).createOrReplaceTempView("tbl_a")
spark.range(20).createOrReplaceTempView("tbl_b")
spark.range(30).createOrReplaceTempView("tbl_c")
val plan = sql(
"""SELECT /*+ MAPJOIN(tbl_a, tbl_a) */
| *
|FROM tbl_a A
| JOIN tbl_b B
| ON B.id = A.id
| JOIN (SELECT /*+ MAPJOIN(tbl_c) */
| XA.id
| FROM tbl_b XA
| LEFT SEMI JOIN tbl_c XB
| ON XB.id = XA.id) C
| ON C.id = A.id
""".stripMargin).queryExecution.optimizedPlan
'Join Inner, (id#0L = id#0L)
:- 'Join Inner, (id#0L = id#0L)
: :- BroadcastHint
: : +- Range (0, 10, splits=2)
: +- Range (0, 20, splits=2)
+- 'Join LeftSemi, (id#0L = id#0L)
:- Range (0, 20, splits=2)
+- BroadcastHint
+- Range (0, 30, splits=2)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/14132#discussion_r70882932
--- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
+BRACKETED_EMPTY_COMMENT
--- End diff --
Yep.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org