You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/07/11 08:34:56 UTC

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/14132

    [SPARK-16475][SQL] Broadcast Hint for SQL Queries

    ## What changes were proposed in this pull request?
    
    Broadcast hint is a way for users to manually annotate a query and suggest to the query optimizer the join method. It is very useful when the query optimizer cannot make optimal decision with respect to join methods due to conservativeness or the lack of proper statistics. The DataFrame API has broadcast hint since Spark 1.5. However, we do not have an equivalent functionality in SQL queries. We propose adding Hive-style broadcast hint to Spark SQL. For more information, please see the [design document](https://issues.apache.org/jira/secure/attachment/12817061/BroadcastHintinSparkSQL.pdf).
    
    - [x] SUPPORT `MAPJOIN` SYNTAX
    - [ ] SUPPORT `BROADCAST JOIN` SYNTAX
    
    ## How was this patch tested?
    
    Pass the Jenkins tests with new testcases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-16475

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14132
    
----
commit 6bd704e5d281bd8c7a1bddaee300688d016bc597
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-07-11T08:04:51Z

    [SPARK-16475][SQL] Broadcast Hint for SQL Queries

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71625598
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    val broadcastHint = project match {
    +      case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
    +        if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
    +      case _ => ""
    +    }
         build(
           "SELECT",
    +      broadcastHint,
    --- End diff --
    
    Okay. I'll try that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62526/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62435/consoleFull)** for PR 14132 at commit [`959dab1`](https://github.com/apache/spark/commit/959dab11d792207768216b0b92cb38979579d2d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thanks for your work! Will try to review it again when the comments are addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71433270
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ---
    @@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
         assertEqual("select a, b from db.c where x !> 1",
           table("db", "c").where('x <= 1).select('a, 'b))
       }
    +
    +  test("select hint syntax") {
    +    // Hive compatibility: Missing parameter raises ParseException.
    +    val m = intercept[ParseException] {
    +      parsePlan("SELECT /*+ HINT() */ * FROM t")
    +    }.getMessage
    +    assert(m.contains("no viable alternative at input"))
    +
    +    // Hive compatibility: No database.
    +    val m2 = intercept[ParseException] {
    +      parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
    +    }.getMessage
    +    assert(m2.contains("no viable alternative at input"))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ HINT */ * FROM t"),
    +      Hint("HINT", Seq.empty, table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ HINT */ * FROM t"),
    +      Hint("HINT", Seq.empty, table("t")).select(star()))
    --- End diff --
    
    A duplicate case, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    The difference of JOIN[5] and JOIN[5] is just due to the following.
    * JOIN[5]
    ```
              TableScan
                alias: t1
                Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: a (type: int)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    sort order: 
                    Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col0 (type: int)
    ```
    * JOIN[5]
    ```
              TableScan
                alias: t1
                Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
                  value expressions: a (type: int)
    ```
    For Query 3 and 4, `t1` -> `t2` and `a` -> `b`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Oh, thank you, @cloud-fan . For current Hive, it is ignored. The followings have the same plans.
    - SELECT * FROM (SELECT * FROM t1) x, t1;
    - SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t1;
    - SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t2;
    - SELECT /*+ MAPJOIN(t2) */ * FROM (SELECT * FROM t1) x, t2;
    ```sql
    hive> set hive.auto.convert.join=false;
    hive> EXPLAIN SELECT * FROM (SELECT * FROM t1) x, t1;
    Warning: Shuffle Join JOIN[7][tables = [$hdt$_0, $hdt$_1]] in Stage 'Stage-1:MAPRED' is a cross product
    OK
    Explain
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
    
    hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t1;
    Warning: Shuffle Join JOIN[5][tables = [x, t1]] in Stage 'Stage-1:MAPRED' is a cross product
    OK
    Explain
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
    
    hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x, t2;
    Warning: Shuffle Join JOIN[5][tables = [x, t2]] in Stage 'Stage-1:MAPRED' is a cross product
    OK
    Explain
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
    
    hive> EXPLAIN SELECT /*+ MAPJOIN(t2) */ * FROM (SELECT * FROM t1) x, t2;
    Warning: Shuffle Join JOIN[5][tables = [x, t2]] in Stage 'Stage-1:MAPRED' is a cross product
    OK
    Explain
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
    ```
    
    In fact, HIVE-3784 (https://issues.apache.org/jira/browse/HIVE-3784) was January 2013.
    Here is the git commit https://github.com/apache/hive/commit/3dca21456a34b6db9cf03e755fae8ecdfaef1730 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    At the implementation side, blocking all subqueries are easiest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71464346
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    val broadcastHint = project match {
    +      case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
    +        if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
    +      case _ => ""
    +    }
         build(
           "SELECT",
    +      broadcastHint,
    --- End diff --
    
    "SELECT" occurs in the followings. But, I didn't added meaning-logic based on the testcases.
    - aggregateToSQL
    - generateToSQL  => This has only "SELECT 1"
    - groupingSetToSQL
    - projectToSQL
    - windowToSQL
    
    I think the test coverage enough due to it generates the above cases. If you suggests more testcases, I welcome. I like robustness both for this PR and for the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Sure, @gatorsmile . But, could you give more specific examples what you mean? Currently,
    - PlanParserSuite has a exception testcase.
    - BroadcastJoinSuite matches all nodes of the plan. It means it tests the unmatched table, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71044308
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    --- End diff --
    
    We need more test cases to verify the behaviors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    When we turn on the option `true`, `t1` is broadcasted.
    ```
    hive> set hive.auto.convert.join=true;
    hive> EXPLAIN SELECT /*+ MAPJOIN(t1) */ * FROM (select * from t1) x, t2;
    Warning: Map Join MAPJOIN[8][bigTable=t2] in task 'Stage-3:MAPRED' is a cross product
    OK
    hive> EXPLAIN SELECT /*+ MAPJOIN(t2) */ * FROM (select * from t1) x, t2;
    Warning: Map Join MAPJOIN[8][bigTable=t2] in task 'Stage-3:MAPRED' is a cross product
    OK
    hive> EXPLAIN SELECT * FROM (select * from t1) x, t2;
    Warning: Map Join MAPJOIN[10][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
    OK
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70227049
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
    @@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
           cases.foreach(assertBroadcastJoin)
         }
       }
    +
    +  test("select hint") {
    --- End diff --
    
    Move this to package `org.apache.spark.sql.catalyst.parser.PlanParserSuite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    To help you find the JIRAs, you can see the umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-11012


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71099153
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    +            val tid = if (names.length > 1) {
    +              TableIdentifier(names(1), Some(names(0)))
    +            } else {
    +              TableIdentifier(param, None)
    +            }
    +            try {
    +              catalog.lookupRelation(tid)
    +
    +              var stop = false
    +              resolvedChild = resolvedChild.transformDown {
    +                case r @ BroadcastHint(SubqueryAlias(t, _))
    +                  if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  r
    +                case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  BroadcastHint(r)
    --- End diff --
    
    When `BroadcastHint` is crossing `SubqueryAlias`, we need to remove the qualifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Sure, @gatorsmile !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71099090
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +        case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
    --- End diff --
    
    When you moving `Hint` and crossing a `SubqueryAlias`, we should consider adding the qualifier back. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Oh, I see. No problem. Thank you for guide. Then, could you update the JIRA too? I added this for the following comments.
    > For more information, please see the attached document. One note about the doc: in addition to supporting "MAPJOIN", we should also support "BROADCASTJOIN".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62525/consoleFull)** for PR 14132 at commit [`3e1fcf6`](https://github.com/apache/spark/commit/3e1fcf6495e144141970ac207a44293ddd0657bf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70889856
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    --- End diff --
    
    Sorry for the wrong query (I didn't test it). My point is what will happen when we encounter multiple query hints? How are they processed? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72010521
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    --- End diff --
    
    Hi, @cloud-fan . Thank you again.
    
    Yes. That's true. So, at my first commit of this PR, I actually implemented to attach just the concrete `BroadcastHint` to the relations in `AstBuilder.scala`. But, there were some advices to
    - Move out the logic from the parser module (for preventing future updates on parser module)
    - Generalize the hint syntax like the current code. 
    
    Since a general `Hint` is attached on a logical plan not a table relation, we can substitute any other plans (filter/project/aggregator), too. The following is my previous comment about this situation. The child of hint is `JOIN`.
    
    > Yep. I grasp your point. First, the given relation is just a logical plan, not a table. In the following case, the relation is Join.
    > SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62832/consoleFull)** for PR 14132 at commit [`d3df3fb`](https://github.com/apache/spark/commit/d3df3fbb61cfe7e573d82d777819220f01b40456).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71457670
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +
    +        // Other UnaryNodes are bypassed.
    +        case u: UnaryNode
    +          if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
    --- End diff --
    
    uh, yeah! please add two more spaces before `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70934669
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    --- End diff --
    
    In Hive/Oracle, what happens if we are unable to find any `UnresolvedRelation` with the same name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you always, @hvanhovell . Annd, sorry for the delay. Since last Saturday, I need to do some important personal stuff offline, so the time is limited for me. :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62198/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62198/consoleFull)** for PR 14132 at commit [`61e4b1b`](https://github.com/apache/spark/commit/61e4b1b7af15193f6f44ccba848fadd8488650b6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62435/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70886975
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -945,8 +955,12 @@ SIMPLE_COMMENT
         : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
         ;
     
    +BRACKETED_EMPTY_COMMENT
    --- End diff --
    
    Ok nvm....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71424857
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
    +import org.apache.spark.sql.hive.test.TestHiveSingleton
    +import org.apache.spark.sql.test.SQLTestUtils
    +
    +class BroadcastHintSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
    +  test("broadcast hint on Hive table") {
    +    withTable("hive_t", "hive_u") {
    +      spark.sql("CREATE TABLE hive_t(a int)")
    +      spark.sql("CREATE TABLE hive_u(b int)")
    +
    +      val hive_t = spark.table("hive_t").queryExecution.analyzed
    +      val hive_u = spark.table("hive_u").queryExecution.analyzed
    +
    +      val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, hive_u")
    +        .queryExecution.analyzed
    +      assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
    --- End diff --
    
    This is fragile. You can use `collectFirst` instead. Also check the content of `BroadcastHint`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71097752
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    How about Oracle?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71098821
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    --- End diff --
    
    uh, your solution depends the rule `AddSubquery`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71043863
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    --- End diff --
    
    @gatorsmile . Some cases come to my mind. Let me check and fix this.
    Thank you!!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62394/consoleFull)** for PR 14132 at commit [`f318185`](https://github.com/apache/spark/commit/f3181852bfacfff0dd95593e3c9b943a1f261336).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71038438
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -377,4 +377,64 @@ class AnalysisSuite extends AnalysisTest {
         assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
         assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
       }
    +
    +  test("Substitute hint") {
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("TaBlE"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
    +      BroadcastHint(testRelation),
    +      caseSensitive = false)
    +
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("table"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
    +      BroadcastHint(testRelation),
    +      caseSensitive = false)
    +
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("TaBlE"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
    +      BroadcastHint(testRelation))
    +
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("table"), UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL"))),
    +      testRelation)
    +
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("TaBlE"),
    +        Project(Seq(UnresolvedAttribute("a")),
    +          UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL")))),
    +      Project(testRelation.output, BroadcastHint(testRelation)))
    +
    +    checkAnalysis(
    +      Hint("MAPJOIN", Seq("TaBlE"),
    +        Project(Seq(UnresolvedAttribute("t.a")),
    +          Join(
    +            UnresolvedRelation(TableIdentifier("TaBlE"), Some("TbL")).as("t"),
    +            UnresolvedRelation(TableIdentifier("TaBlE2"), Some("TbL")).as("u"),
    +            Inner,
    +            None
    +          ))),
    +      Project(testRelation.output,
    +        Join(
    +          BroadcastHint(testRelation),
    +          testRelation2,
    +          Inner,
    +          None
    +        )))
    +
    +    checkAnalysis(
    --- End diff --
    
    add a test case with nested hints?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62762/consoleFull)** for PR 14132 at commit [`3fa276d`](https://github.com/apache/spark/commit/3fa276d20d9bef56f3bc25e3a6f9d333cfaefdaf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71097984
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    Yep. You can try to another PR. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71627448
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    --- End diff --
    
    Can you post your code when you merge them in a single case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71683008
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    val broadcastHint = project match {
    +      case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
    +        if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
    +      case _ => ""
    +    }
         build(
           "SELECT",
    +      broadcastHint,
    --- End diff --
    
    Hmm. For this one, I found more reasonable solution. Instead of adding new dummy `Project`, we can utilize the existing dummy `Project` on the `SQLTable`.
    
    Basically, the root cause of this problem of `groupingSetToSQL` is that it removes the project by doing this `toSQL(project.child)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70937097
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
            | fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
    --- End diff --
    
    How about this `SELECT`? Should we also allow users use `Hint` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71098009
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -87,6 +87,7 @@ class Analyzer(
           EliminateUnions),
         Batch("Resolution", fixedPoint,
           ResolveRelations ::
    +      SubstituteHints ::
    --- End diff --
    
    Have you tried to move it after the batch `Resolution`? Any error you hit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Does this work in `DataFrame` API, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71227811
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         )
       }
     
    +  test("broadcast hint with window") {
    +    checkGeneratedSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
    +        |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
    +      """.stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER (PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`
    +        |FROM (SELECT `gen_attr`, `gen_attr`
    +        |      FROM (SELECT gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
    +        |                   max(`gen_attr`) OVER (PARTITION BY `gen_attr` ORDER BY `gen_attr` ASC
    +        |                                         RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `gen_attr`
    +        |            FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, `gen_attr`,
    +        |                         (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`
    +        |                  FROM ((SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                         FROM `default`.`parquet_t1`)
    +        |                         AS gen_subquery_0)
    +        |                  AS x
    +        |           INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                       FROM `default`.`parquet_t1`)
    +        |                       AS gen_subquery_1
    +        |           ON (`gen_attr` = `gen_attr`))
    +        |           AS gen_subquery_2)
    +        |      AS gen_subquery_3)
    +        |      AS x
    +      """.stripMargin
    +    )
    +  }
    +
    +  test("broadcast hint with rollup") {
    +    checkGeneratedSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       count(*) as cnt, key%5, grouping_id()
    +        |FROM parquet_t1
    +        |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS BIGINT))`, `gen_attr` AS `grouping_id()`
    +        |FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
    +        |             (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
    +        |      FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |            FROM `default`.`parquet_t1`)
    +        |            AS gen_subquery_0
    +        |      GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
    +        |      GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
    +        |      AS gen_subquery_1
    +      """.stripMargin)
    +
    +  }
    +
    +  test("broadcast hint with grouping sets") {
    +    checkGeneratedSQL(
    +      s"""
    +         |SELECT /*+ MAPJOIN(parquet_t1) */
    +         |       count(*) AS cnt, key % 5 AS k1, key - 5 AS k2,grouping_id() AS k3
    +         |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY key % 5, key - 5
    +         |GROUPING SETS (key % 5, key - 5)
    +      """.stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS `k2`, `gen_attr` AS `k3`
    +        |FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`,
    +        |             (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
    +        |      FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
    +        |                   (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
    +        |                   (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
    +        |            FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                  FROM `default`.`parquet_t1`)
    +        |                  AS gen_subquery_0)
    +        |            AS t
    +        |      GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - CAST(5 AS BIGINT))
    +        |      GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ((`gen_attr` - CAST(5 AS BIGINT)))))
    +        |      AS gen_subquery_1
    +      """.stripMargin)
    +  }
    +
    +
       // This will be removed in favor of SPARK-16590.
       private def checkGeneratedSQL(sqlString: String, expectedString: String): Unit = {
    -    val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
    +    val df = sql(sqlString)
    +    logError("\n" + df.queryExecution.analyzed.treeString)
    --- End diff --
    
    After https://github.com/apache/spark/pull/14235, I need to change the testcase style, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62491/consoleFull)** for PR 14132 at commit [`e6a44ed`](https://github.com/apache/spark/commit/e6a44ed3d97692f033ea4b1832b08527e6157aa2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:

    https://github.com/apache/spark/pull/14132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71098966
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    --- End diff --
    
    You do not need to enumerate all the types. This depends on the tree structure. `UnaryNode`, `BinaryNode` and so on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70938251
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +
    +        // Remove unrecognized hint
    --- End diff --
    
    `hint` -> `hints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71820260
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           case SqlBaseParser.SELECT =>
             // Regular select
     
    +        // Add hints.
    +        val withHint = relation.optionalMap(ctx.hint)(withHints)
    --- End diff --
    
    according to the grammar: `SELECT /*+ hint... */ ...`, it seems that hint should be applied to the whole SELECT clause, not only its relations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)** for PR 14132 at commit [`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I'm back. I'll resolve them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71819403
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,16 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' hintStatement '*/'
    +    ;
    +
    +hintStatement
    +    : hintName=identifier
    +    | hintName=identifier '(' parameters+=identifier parameters+=identifier ')'
    --- End diff --
    
    do we need this rule for broadcast hint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    OK what I meant was "supporting /*+ BROADCASTJOIN */", not the Impala style ..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @gatorsmile let me know when you think this is ready.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71227405
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         )
       }
     
    +  test("broadcast hint with window") {
    +    checkGeneratedSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
    +        |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
    +      """.stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER (PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`
    +        |FROM (SELECT `gen_attr`, `gen_attr`
    +        |      FROM (SELECT gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
    +        |                   max(`gen_attr`) OVER (PARTITION BY `gen_attr` ORDER BY `gen_attr` ASC
    +        |                                         RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `gen_attr`
    +        |            FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, `gen_attr`,
    +        |                         (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`
    +        |                  FROM ((SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                         FROM `default`.`parquet_t1`)
    +        |                         AS gen_subquery_0)
    +        |                  AS x
    +        |           INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                       FROM `default`.`parquet_t1`)
    +        |                       AS gen_subquery_1
    +        |           ON (`gen_attr` = `gen_attr`))
    +        |           AS gen_subquery_2)
    +        |      AS gen_subquery_3)
    +        |      AS x
    +      """.stripMargin
    +    )
    +  }
    +
    +  test("broadcast hint with rollup") {
    +    checkGeneratedSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       count(*) as cnt, key%5, grouping_id()
    +        |FROM parquet_t1
    +        |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS BIGINT))`, `gen_attr` AS `grouping_id()`
    +        |FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
    +        |             (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
    +        |      FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |            FROM `default`.`parquet_t1`)
    +        |            AS gen_subquery_0
    +        |      GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
    +        |      GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
    +        |      AS gen_subquery_1
    +      """.stripMargin)
    +
    +  }
    +
    +  test("broadcast hint with grouping sets") {
    +    checkGeneratedSQL(
    +      s"""
    +         |SELECT /*+ MAPJOIN(parquet_t1) */
    +         |       count(*) AS cnt, key % 5 AS k1, key - 5 AS k2,grouping_id() AS k3
    +         |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY key % 5, key - 5
    +         |GROUPING SETS (key % 5, key - 5)
    +      """.stripMargin,
    +      """
    +        |SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS `k2`, `gen_attr` AS `k3`
    +        |FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`,
    +        |             (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, grouping_id() AS `gen_attr`
    +        |      FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
    +        |                   (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
    +        |                   (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
    +        |            FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
    +        |                  FROM `default`.`parquet_t1`)
    +        |                  AS gen_subquery_0)
    +        |            AS t
    +        |      GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - CAST(5 AS BIGINT))
    +        |      GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ((`gen_attr` - CAST(5 AS BIGINT)))))
    +        |      AS gen_subquery_1
    +      """.stripMargin)
    +  }
    +
    +
       // This will be removed in favor of SPARK-16590.
       private def checkGeneratedSQL(sqlString: String, expectedString: String): Unit = {
    -    val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
    +    val df = sql(sqlString)
    +    logError("\n" + df.queryExecution.analyzed.treeString)
    --- End diff --
    
    Oops. Please ignore this since we will be under review for a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70227231
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,14 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' mapJoinHint '*/'
    +    ;
    +
    +mapJoinHint
    +    : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
    --- End diff --
    
    Ok - lets keep it this way then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    In the JIRA https://issues.apache.org/jira/browse/SPARK-11012, we documented the current limitation of SQL generation:
    >Note that not all resolved logical query plan can be converted back to SQL query string. Either because it consists of some language structure that has not been supported yet, or it doesn't have a SQL representation inherently (e.g. query plans built on top of local Scala collections).
    
    The purpose of SQL generation is for better native view support. The target logical plan must be parsed from a valid HiveQL query statement. Thus, the two examples you mentioned above are not supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62354/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62578/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72037553
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = resolvedChild.transformDown {
    --- End diff --
    
    Does this really work? `transformDown` is DFS I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    For `Table-Level Optimizer Hints` of MySQL, it applied to all table with the same name.
    > The hint applies to all tables that it names
    http://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html
    
    This is not what we want here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71736274
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
             toSQL(p.right),
             p.condition.map(" ON " + _.sql).getOrElse(""))
     
    +    case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: SQLTable))) =>
    --- End diff --
    
    The new way looks cleaner, but this looks like a special handling. Can you write a code comment to explain when it is being used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    What I mean is currently how to broadcast the Hive table `tab1`? I'm making the testcase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Yep. The above example, `spark.range(10).createOrReplaceTempView("t")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71823645
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    --- End diff --
    
    so why do we have this case? `SQLBuilder` will be applied on analyzed plan right?
    
    And I think it's better to push this check in `CheckAnalysis`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70938037
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -84,7 +84,8 @@ class Analyzer(
         Batch("Substitution", fixedPoint,
           CTESubstitution,
           WindowsSubstitution,
    -      EliminateUnions),
    +      EliminateUnions,
    +      SubstituteHint),
    --- End diff --
    
    Like `EliminateUnions`, please rename it to `SubstituteHints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71044871
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && resolver(t.table, table) =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +
    +        // Remove unrecognized hints
    +        case Hint(name, _, child) =>
    +          child
    --- End diff --
    
    If they fit one line, the general style rule is to put them in one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62435/consoleFull)** for PR 14132 at commit [`959dab1`](https://github.com/apache/spark/commit/959dab11d792207768216b0b92cb38979579d2d5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71038495
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -377,4 +377,64 @@ class AnalysisSuite extends AnalysisTest {
         assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
         assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
       }
    +
    +  test("Substitute hint") {
    --- End diff --
    
    let's create a SubstituteHintsSuite and put these there as multiple test cases.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)** for PR 14132 at commit [`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70889621
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    --- End diff --
    
    The second `LEFT SEMI JOIN` one should apply the downward nearest one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)** for PR 14132 at commit [`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71040732
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    --- End diff --
    
    This is another one. You could cause multiple nested `BroadcastHint`, if the names are duplicate. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Yep. I know why you said like that. I also feel like that, but Spark 2.0 uses that term in API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)** for PR 14132 at commit [`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71756794
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
             toSQL(p.right),
             p.condition.map(" ON " + _.sql).getOrElse(""))
     
    +    case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: SQLTable))) =>
    --- End diff --
    
    Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Let me see the example PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70222831
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -945,8 +955,12 @@ SIMPLE_COMMENT
         : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
         ;
     
    +BRACKETED_EMPTY_COMMENT
    --- End diff --
    
    Yes. Could you give me some workaround advice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70425722
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,15 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' hintStatement '*/'
    +    ;
    +
    +hintStatement
    +    : hintName=identifier identifierList
    +    | hintName=identifier '(' parameter1=identifier parameter2=identifier ')'
    --- End diff --
    
    What are we trying to support here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70869241
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,39 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        // MAPJOIN
    +        case h @ Hint(name, parameters, child) if name.equalsIgnoreCase("MAPJOIN") =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +        // BROADCAST (LEFT|RIGHT)
    +        case h @ Hint(name, parameters, j @ Join(left, _, _, _))
    +            if name.equalsIgnoreCase("BROADCAST_LEFT") =>
    +          j.copy(left = BroadcastHint(left))
    +        case h @ Hint(name, parameters, j @ Join(_, right, _, _))
    +          if name.equalsIgnoreCase("BROADCAST_RIGHT") =>
    +          j.copy(right = BroadcastHint(right))
    +
    +        // Remove unrecognized hint
    +        case Hint(_, _, child) => child
    --- End diff --
    
    I think we can add `logDebug` here.
    But, usually, you know, DBMS ignores `Hint` by default.
    Any malformed or unknown hints become **comment**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71203247
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -755,4 +755,243 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
           checkHiveQl("select * from orc_t")
         }
       }
    +
    +  test("broadcast hint on single table") {
    +    checkGeneratedSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
    +      """
    +        |SELECT `gen_attr` AS `id`
    +        |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
    +        |      FROM (SELECT `id` AS `gen_attr`
    +        |            FROM `default`.`parquet_t0`)
    +        |            AS gen_subquery_0)
    +        |      AS parquet_t0
    +        |""".stripMargin)
    +
    +    checkGeneratedSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
    +      """
    +        |SELECT `gen_attr` AS `id`
    +        |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
    +        |      FROM (SELECT `id` AS `gen_attr`
    +        |            FROM `default`.`parquet_t0`)
    +        |            AS gen_subquery_0)
    +        |      AS parquet_t0
    +        |""".stripMargin)
    +
    +    checkGeneratedSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
    +      """
    +        |SELECT `gen_attr` AS `id`
    +        |FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
    +        |      FROM ((SELECT `id` AS `gen_attr`
    +        |             FROM `default`.`parquet_t0`)
    +        |             AS gen_subquery_0)
    +        |            AS a)
    +        |      AS a
    +      """.stripMargin)
    +
    +    checkGeneratedSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t0) */ *
    --- End diff --
    
    It's beyond the scope of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70876188
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
        * separated) relations here, these get converted into a single plan by condition-less inner join.
        */
       override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
    -    val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
    +    val relationPlan = ctx.relation.asScala.map {
    +      case jrc: JoinRelationContext if jrc.broadcast != null =>
    +        if (jrc.broadcast.left == null) { // default
    +          Hint("BROADCAST_RIGHT", Seq.empty[String], plan(jrc))
    --- End diff --
    
    I'm happy with this kind of argument. I agree with you. I prefer to eliminate at the first line. :)
    BTW, fortunately, this logic is removed now. We can proceed without disagreement on here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    For old hives, I can get the binary after Hive 1.0.
    http://www-us.apache.org/dist/hive/
    
    The above result was Hive 2.0 and 1.2. I think Hive 1.0 and 1.1 will be the same as expected.
    
    The legacy code was January 2013.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70930995
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +
    +        // Remove unrecognized hint
    +        case Hint(name, _, child) =>
    +          logDebug(s"Ignore Unknown Hint: $name")
    --- End diff --
    
    Sure. No problem. It was just added by @hvanhovell 's request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71825810
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           case SqlBaseParser.SELECT =>
             // Regular select
     
    +        // Add hints.
    +        val withHint = relation.optionalMap(ctx.hint)(withHints)
    --- End diff --
    
    Yep. I grasp your point. First, the given `relation` is just a logical plan, not a table. In the following case, the `relation` is `Join`.
     ```
    SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1
    ```
    This means just a context. 
    
    Second, I agree with your point. Generally, `hint` is about tables like join order or access path. We can provides some hints on filter conditions, I'm not sure about the cases so far. But, even for that cases, we can adapt that in `Analyzer` easily. In other words, we can identify `Hint("FILTERHINT")` in the same SPJ.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    So far, I couldn't do the following two advices. I just inform you that I'm still working these. :)
    
    - `HINT_PREFIX`
    - `| hintName=identifier '(' parameters+=identifier parameters+=identifier ')'`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70938777
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -84,7 +84,8 @@ class Analyzer(
         Batch("Substitution", fixedPoint,
           CTESubstitution,
           WindowsSubstitution,
    -      EliminateUnions),
    +      EliminateUnions,
    +      SubstituteHint),
    --- End diff --
    
    Great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70935879
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    --- End diff --
    
    Yeah, `Analyzer` always consider case sensitivity issues. Please check how to use `resolver`: https://github.com/apache/spark/blob/8b5a8b25b9d29b7d0949d5663c7394b26154a836/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L68-L74


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62403/consoleFull)** for PR 14132 at commit [`bb44615`](https://github.com/apache/spark/commit/bb44615c3480369fc7b9381a955da817975eeb01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Really? Can you show me the files? I think we should correct them


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62966/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72036822
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -372,6 +372,10 @@ trait CheckAnalysis extends PredicateHelper {
                      |in operator ${operator.simpleString}
                    """.stripMargin)
     
    +          case Hint(_, _, _) =>
    +            throw new IllegalStateException(
    +              "logical hint operator should have been removed by the optimizer")
    --- End diff --
    
    Oh!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72011582
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    --- End diff --
    
    First of all, I'll take a look again. I need some boot-up time. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71446840
  
    --- Diff: sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
    @@ -0,0 +1,7 @@
    +-- This file is automatically generated by LogicalPlanToSQLSuite.
    +SELECT /*+ MAPJOIN(parquet_t0) */ *
    +FROM (SELECT id tid FROM parquet_t0) T
    +JOIN (SELECT key uid FROM parquet_t1) U
    +ON tid=uid
    +--------------------------------------------------------------------------------
    +SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, `gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
    --- End diff --
    
    Yep. I'm trying to recover them in https://github.com/apache/spark/pull/14257 , too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62491/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62675/consoleFull)** for PR 14132 at commit [`ad10ed6`](https://github.com/apache/spark/commit/ad10ed6144806574e49274119f22c0d5021a47a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70931050
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    --- End diff --
    
    Oh, I missed that. Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for the great work by @dongjoon-hyun !
    
    Although the code changes are a lot, but most are test cases. The code is ready to review! @rxin @hvanhovell 
    
    Also cc @cloud-fan @liancheng You might have a better idea in SQL generation. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71823152
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    --- End diff --
    
    Also, `SparkStrategies` ensures that here https://github.com/apache/spark/pull/14132/files#diff-7253a38df7e111ecf6b1ef71feba383bR347 , too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71259834
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -87,6 +87,7 @@ class Analyzer(
           EliminateUnions),
         Batch("Resolution", fixedPoint,
           ResolveRelations ::
    +      SubstituteHints ::
    --- End diff --
    
    In each batch, the order of rules should not matter. That means, the rule `SubstituteHints` impacts the other rules. Please fix it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71823613
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           case SqlBaseParser.SELECT =>
             // Regular select
     
    +        // Add hints.
    +        val withHint = relation.optionalMap(ctx.hint)(withHints)
    --- End diff --
    
    for now the hint can only affect the relations(broadcast hint), so it's ok to just put the `Hint` on top of relation. But generally hint can affect anything in the select clause, e.g. maybe the filter condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71098026
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    I intentionally remove that from this PR. You cannot get all in a single PR.
    I think you know Oracle Global Hint (you already asked to support view_name.table_name.), you can do better than me at that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71430308
  
    --- Diff: sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
    @@ -0,0 +1,7 @@
    +-- This file is automatically generated by LogicalPlanToSQLSuite.
    +SELECT /*+ MAPJOIN(parquet_t0) */ *
    +FROM (SELECT id tid FROM parquet_t0) T
    +JOIN (SELECT key uid FROM parquet_t1) U
    +ON tid=uid
    +--------------------------------------------------------------------------------
    +SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, `gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
    --- End diff --
    
    oh, I see. The compare removes the id.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)** for PR 14132 at commit [`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70939090
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +
    +        // Remove unrecognized hint
    --- End diff --
    
    Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I do not know the exact context why you need SQL generation. Tomorrow, I will review your PR (https://github.com/apache/spark/pull/14116) to understand your issues tomorrow. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71626761
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +
    +        // Other UnaryNodes are bypassed.
    +        case u: UnaryNode
    +            if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
    +          val hint = u.child.asInstanceOf[Hint]
    +          hint.copy(child = u.withNewChildren(Seq(hint.child)))
    +
    +        // Other binary or higher operations are ignored.
    --- End diff --
    
    We still need to explain why they can be ignored. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for all comments, @hvanhovell and @rxin . I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62714/consoleFull)** for PR 14132 at commit [`2ee7d21`](https://github.com/apache/spark/commit/2ee7d21889490f05b591a276fa33df96693caf05).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62347/consoleFull)** for PR 14132 at commit [`142f5ef`](https://github.com/apache/spark/commit/142f5eff7f3f30fea6b5d48f918ffcf478ddccd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62832/consoleFull)** for PR 14132 at commit [`d3df3fb`](https://github.com/apache/spark/commit/d3df3fbb61cfe7e573d82d777819220f01b40456).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62198/consoleFull)** for PR 14132 at commit [`61e4b1b`](https://github.com/apache/spark/commit/61e4b1b7af15193f6f44ccba848fadd8488650b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @dongjoon-hyun  this will break SQL view generation. Can you update SQLBuilder and LogicalPlanToSQLSuite ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71261455
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    +            val tid = if (names.length > 1) {
    +              TableIdentifier(names(1), Some(names(0)))
    +            } else {
    +              TableIdentifier(param, None)
    +            }
    +            try {
    +              catalog.lookupRelation(tid)
    +
    +              var stop = false
    +              resolvedChild = resolvedChild.transformDown {
    +                case r @ BroadcastHint(SubqueryAlias(t, _))
    +                  if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  r
    +                case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  BroadcastHint(r)
    --- End diff --
    
    Sure, If you want to remove this, I can simply this. It's a little bit legacy I tried to follow your advice as much as possible. (As I mentioned before, I decide to block in Parser layer after I found that Hive does.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70935292
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    --- End diff --
    
    Hint is ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hive and Spark will raise `ParseException`. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62577/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71443387
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ---
    @@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
         assertEqual("select a, b from db.c where x !> 1",
           table("db", "c").where('x <= 1).select('a, 'b))
       }
    +
    +  test("select hint syntax") {
    +    // Hive compatibility: Missing parameter raises ParseException.
    +    val m = intercept[ParseException] {
    +      parsePlan("SELECT /*+ HINT() */ * FROM t")
    +    }.getMessage
    +    assert(m.contains("no viable alternative at input"))
    +
    +    // Hive compatibility: No database.
    +    val m2 = intercept[ParseException] {
    +      parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
    +    }.getMessage
    +    assert(m2.contains("no viable alternative at input"))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ HINT */ * FROM t"),
    +      Hint("HINT", Seq.empty, table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ HINT */ * FROM t"),
    +      Hint("HINT", Seq.empty, table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ BROADCASTJOIN(u) */ * FROM t"),
    +      Hint("BROADCASTJOIN", Seq("u"), table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ MAPJOIN(u) */ * FROM t"),
    +      Hint("MAPJOIN", Seq("u"), table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ STREAMTABLE(a,b,c) */ * FROM t"),
    +      Hint("STREAMTABLE", Seq("a", "b", "c"), table("t")).select(star()))
    +
    +    comparePlans(
    +      parsePlan("SELECT /*+ INDEX(t emp_job_ix) */ * FROM t"),
    +      Hint("INDEX", Seq("t", "emp_job_ix"), table("t")).select(star()))
    +  }
    --- End diff --
    
    Yep. I added like this.
    ```
    comparePlans(
      parsePlan("SELECT /*+ MAPJOIN(`default.t`) */ * from `default.t`"),
      Hint("MAPJOIN", Seq("default.t"), table("default.t")).select(star()))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71463733
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    val broadcastHint = project match {
    +      case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
    +        if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else ""
    +      case _ => ""
    +    }
         build(
           "SELECT",
    +      broadcastHint,
    --- End diff --
    
    It's the result of Window test query. For the Windows query, there were nested Projects.
    ```
    test("broadcast hint with window") {
        checkSQL(
          """
            |SELECT /*+ MAPJOIN(parquet_t1) */
            |       x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
            |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
          """.stripMargin,
          "broadcast_hint_window")
      }
    ```
    
    I had the same feeling why some "SELECT" doesn't happen. 
    
    After https://issues.apache.org/jira/browse/SPARK-16576 , I think this kind of weirdness will be reduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62403/consoleFull)** for PR 14132 at commit [`bb44615`](https://github.com/apache/spark/commit/bb44615c3480369fc7b9381a955da817975eeb01).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71425663
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
    +import org.apache.spark.sql.hive.test.TestHiveSingleton
    +import org.apache.spark.sql.test.SQLTestUtils
    +
    +class BroadcastHintSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
    +  test("broadcast hint on Hive table") {
    +    withTable("hive_t", "hive_u") {
    +      spark.sql("CREATE TABLE hive_t(a int)")
    +      spark.sql("CREATE TABLE hive_u(b int)")
    +
    +      val hive_t = spark.table("hive_t").queryExecution.analyzed
    +      val hive_u = spark.table("hive_u").queryExecution.analyzed
    +
    +      val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, hive_u")
    +        .queryExecution.analyzed
    +      assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
    --- End diff --
    
    Sure! No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71259501
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    --- End diff --
    
    If you read what we did in SQLBuilder, you might know that is not the normal way we did. @rxin gave the same comment below: https://github.com/apache/spark/pull/14132#issuecomment-233503432. 
    
    Keeping a white list is hard to maintain. You know, I still can find more missing cases here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71626636
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    +      toSQL(child)
    +
    +    case BroadcastHint(child) =>
    +      toSQL(child)
    --- End diff --
    
    Yep. Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    By the way, the view issue is moved into [SPARK-16492](https://issues.apache.org/jira/browse/SPARK-16492).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62975/consoleFull)** for PR 14132 at commit [`4023d97`](https://github.com/apache/spark/commit/4023d974f34052bb29e12fd93aeb187ea12b536f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70874930
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
        * separated) relations here, these get converted into a single plan by condition-less inner join.
        */
       override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
    -    val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
    +    val relationPlan = ctx.relation.asScala.map {
    +      case jrc: JoinRelationContext if jrc.broadcast != null =>
    +        if (jrc.broadcast.left == null) { // default
    +          Hint("BROADCAST_RIGHT", Seq.empty[String], plan(jrc))
    --- End diff --
    
    It does.
    
    However here is my counter argument :)... In this case we had to define the syntax for a hint in the grammar (join will break without it). If we have to do that we are tied to that particular hint anyway, so in those cases I would implement it right away. The added bonus here would be that you simplify your analysis rule by quite a bit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71203038
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    +            val tid = if (names.length > 1) {
    +              TableIdentifier(names(1), Some(names(0)))
    +            } else {
    +              TableIdentifier(param, None)
    +            }
    +            try {
    +              catalog.lookupRelation(tid)
    +
    +              var stop = false
    +              resolvedChild = resolvedChild.transformDown {
    +                case r @ BroadcastHint(SubqueryAlias(t, _))
    +                  if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  r
    +                case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  BroadcastHint(r)
    --- End diff --
    
    Here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62564/consoleFull)** for PR 14132 at commit [`94aaf6b`](https://github.com/apache/spark/commit/94aaf6b988cfb94c6d14b44a708d75657e75d0b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @dongjoon-hyun Finished the fourth pass. Sorry, it took a long time to review it. So many meetings... 
    
    Just a question, what is the output from Hive when users use a fully qualified name in Hint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62491/consoleFull)** for PR 14132 at commit [`e6a44ed`](https://github.com/apache/spark/commit/e6a44ed3d97692f033ea4b1832b08527e6157aa2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @hvanhovell . So far, I tried in the following way for `HINT_PREFIX`.
    Due to the prefix overlapping, `BRACKETED_COMMENT` seems to eat the HINT.
    Could you give me some advice more?
    
    ```antlr
     hint
    -    : '/*+' hintStatement '*/'
    +    : HINT_PREFIX hintStatement '*/'
         ;
     
     hintStatement
    @@ -961,12 +961,12 @@ SIMPLE_COMMENT
         : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
         ;
     
    -BRACKETED_EMPTY_COMMENT
    -    : '/**/' -> channel(HIDDEN)
    +HINT_PREFIX
    +    : '/*+'
         ;
     
     BRACKETED_COMMENT
    -    : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
    +    : '/*' .*? '*/' -> channel(HIDDEN)             <--- The original one.
         ;
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71261042
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +        case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = s.copy(child = hintChild))
    +        case ll @ LocalLimit(_, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = ll.copy(child = hintChild))
    +        case f @ Filter(_, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = f.copy(child = hintChild))
    +        case a @ Aggregate(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = a.copy(child = hintChild))
    +        case s @ Sort(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = s.copy(child = hintChild))
    +        case g @ Generate(_, _, _, _, _, h @ Hint("BROADCAST", _, hintChild)) =>
    +          h.copy(child = g.copy(child = hintChild))
    +        // Set operation is not allowed to be across. UNION/INTERCEPT/EXCEPT
    --- End diff --
    
    Okay. But, that was the reason not to allowed there. Hmm, maybe it looks different. Sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62196/consoleFull)** for PR 14132 at commit [`5c76e25`](https://github.com/apache/spark/commit/5c76e25851888f0cafa1815a12760846efeb2077).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62347/consoleFull)** for PR 14132 at commit [`142f5ef`](https://github.com/apache/spark/commit/142f5eff7f3f30fea6b5d48f918ffcf478ddccd6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62675/consoleFull)** for PR 14132 at commit [`ad10ed6`](https://github.com/apache/spark/commit/ad10ed6144806574e49274119f22c0d5021a47a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71097256
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    Currently, the parser does not allow to specify database name. (I didn't commit the code since I found that Hive doesn't, too.) In case of the parser acceptance, it's valid for all the case because the parser will accept only identifiers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62975/consoleFull)** for PR 14132 at commit [`4023d97`](https://github.com/apache/spark/commit/4023d974f34052bb29e12fd93aeb187ea12b536f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71694743
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    --- End diff --
    
    For the functional equivalence, you can see the logic. Also, the above code is passed `LogicalPlanToSQLSuite` with the existing tests, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71825927
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    --- End diff --
    
    Ur, it's added by previous advice comments.
    No problem. I'll move that into `CheckAnalysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70223258
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -945,8 +955,12 @@ SIMPLE_COMMENT
         : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
         ;
     
    +BRACKETED_EMPTY_COMMENT
    +    : '/**/' -> channel(HIDDEN)
    +    ;
    +
     BRACKETED_COMMENT
    -    : '/*' .*? '*/' -> channel(HIDDEN)
    +    : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
    --- End diff --
    
    It might be easier to introduce a HINT_PREFIX rule (`'/*+'`) and place this before the BRACKET_COMMENT rule. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for replying, @gatorsmile . Yes. Right.
    
    In fact, that limitation is important to me. To complete my `INFORMATION_SCHEMA` PR (https://github.com/apache/spark/pull/14116), I need to support view for Spark temporal views some day later. Of course, currently, in that PR, I removed the VIEW DEFINITION column implementation part due to many reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Oh, @rxin .
    Since the position of `BroadcastHint` is different from the original, we can not figure out the exact SELECT from the multiple SELECT candidates. I'm trying to attach to the nearest SELECT ancestor if possible. I hope that is okay since the functionality is the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71202999
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +        case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
    --- End diff --
    
    It seems `database.table` issue which I mentioned before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71041967
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    --- End diff --
    
    Could you give me a little bit more hint? :)
    Is the naming `resolvedChild`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70883834
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -509,6 +512,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
       }
     
       /**
    +   * Add a Hint to a logical plan.
    +   */
    +  private def withHints(
    +      ctx: HintContext,
    +      relation: LogicalPlan): LogicalPlan = withOrigin(ctx) {
    +    val stmt = ctx.hintStatement
    +    val name = stmt.hintName.getText.toUpperCase
    +    if (stmt.identifierList != null) {
    +      Hint(name, visitIdentifierList(stmt.identifierList), relation)
    +    } else if (stmt.parameters != null) {
    +      Hint(name, stmt.parameters.asScala.map(_.getText), relation)
    +    } else {
    +      Hint(name, Seq.empty[String], relation)
    --- End diff --
    
    Hmm. Hive gets the following, but mine doesn't. 
    ```
    hive> select /*+ MAPJOIN */ * from t;
    OK
    ```
    Let me fix this. Thank you so much. You're a human compiler. :) +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    This looks pretty good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    For the HINT_PREFIX, I tried at the first, but still have some problem. So, I couldn't include the very previous commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71823079
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    --- End diff --
    
    After applying the rules of `Analyzer`, there will be no more `Hint` case class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I see. For `NormalizeBroadcastHint`, I will try to minimize the cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    cc @rxin @hvanhovell what do you think of this case?
    `SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x JOIN t1`
    
    Applying the hint to the `t1` after JOIN looks more intuitive to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71260947
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    +            val tid = if (names.length > 1) {
    +              TableIdentifier(names(1), Some(names(0)))
    +            } else {
    +              TableIdentifier(param, None)
    +            }
    +            try {
    +              catalog.lookupRelation(tid)
    +
    +              var stop = false
    +              resolvedChild = resolvedChild.transformDown {
    +                case r @ BroadcastHint(SubqueryAlias(t, _))
    +                  if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  r
    +                case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  BroadcastHint(r)
    --- End diff --
    
    If we do not support `db.table`, why we still compare the whole identifier? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @dongjoon-hyun sure. It was merely a suggestion to get rid of the `BRACKETED_EMPTY_COMMENT ` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Is it related? This is the most basic test case, right?
    ```SQL
          CREATE TABLE tab1(c1 int)
          select * from tab1, tab1
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70937541
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -354,6 +354,14 @@ case class BroadcastHint(child: LogicalPlan) extends UnaryNode {
       override lazy val statistics: Statistics = super.statistics.copy(isBroadcastable = true)
     }
     
    +/**
    + * A general hint for the child.
    + * a pair of (name, parameters).
    --- End diff --
    
    `a` -> `A`. I think we still can disclose more info about this logical node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for guidance, @cloud-fan !
    I'll implement like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72181762
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = resolvedChild.transformDown {
    --- End diff --
    
    You probably want to look at older versions of Hive. For example, 0.10. After https://issues.apache.org/jira/browse/HIVE-3784, Hive does not really use map join hint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70938951
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    --- End diff --
    
    For this one, we have this rule in `Batch("Substitution")`. I think that's fairly enough information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62162/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    What I mean is not making real view as you concern. I'm telling the one I need to make a view definition by SQL generation is the result of 
    ```
    spark.range(10).createOrReplaceTempView("t")
    new org.apache.spark.sql.catalyst.SQLBuilder(sql("select * from t")).toSQL
    ```
    I'm not disagree with your opinion here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71041157
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
            | fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
    --- End diff --
    
    uh... This grammar file kills my eye... You are right. So far, to support Broadcast Hint, it is good enough. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)** for PR 14132 at commit [`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for giving opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Not all the joins have the operators `SubqueryAlias`. For example, below is a self join against Hive tables:
    ```
    == Analyzed Logical Plan ==
    c1: int, c1: int
    Project [c1#7, c1#8]
    +- Join Inner
       :- MetastoreRelation default, tab1
       +- MetastoreRelation default, tab1
    ```
    
    Thus, the current solution does not work, right?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71097771
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    Let me try the quoted identifiers in the parser. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71096995
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    This does not work in all the cases, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71463337
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +
    +        // Other UnaryNodes are bypassed.
    +        case u: UnaryNode
    +          if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
    --- End diff --
    
    Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    The following is updated.
    - Add more descriptions and test cases (about finding closest table and nested hint).
    - Support no parameter hint like `/*+ INDEX */`.
    - Generalize `hintStatement` rule.
    - Simplify `withHints`.
    - Move `toUpperCase` into Analyzer rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)** for PR 14132 at commit [`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71047081
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    --- End diff --
    
    Yep.  I agree with you for that purpose of future use.
    The testsuites are not able to guarantee that for all cases.
    I will add some description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @dongjoon-hyun please only support the Hive style mapjoin hint, and don't add the "LEFT OUTER JOIN BROADCAST" way. It'd mean queries written in Spark 2.1 would not work with earlier versions of Spark.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71465508
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    --- End diff --
    
    Also, in the PR description, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    All the test cases are for positive cases. We also should add more negative cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71626212
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
         case OneRowRelation =>
           ""
     
    +    case Hint(_, _, child) =>
    +      toSQL(child)
    +
    +    case BroadcastHint(child) =>
    +      toSQL(child)
    --- End diff --
    
    This is dead code. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62347/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    There is also related issue you might be interested. 
    https://issues.apache.org/jira/browse/SPARK-16601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you for @gatorsmile and @rxin .
    I'll remote `logDebug` and add a unittest for Analyzer soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    @dongjoon-hyun it is ok not to show the view sql string if the current framework does not support it (e.g. it is generated by a dataframe without the sql equivalent).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I see. Sorry for misunderstanding. Let me summary again. The following is the final decision, right?
    ```
    SELECT /*+ MAPJOIN(t) */ * FROM t JOIN u ON t.id = u.id
    SELECT /*+ MAPJOIN(u) */ * FROM t JOIN u ON t.id = u.id
    SELECT /*+ BROADCASTJOIN(t) */ * FROM t JOIN u ON t.id = u.id
    SELECT /*+ BROADCASTJOIN(u) */ * FROM t JOIN u ON t.id = u.id
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70867118
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -374,8 +383,8 @@ setQuantifier
     
     relation
         : left=relation
    -      ((CROSS | joinType) JOIN right=relation joinCriteria?
    -      | NATURAL joinType JOIN right=relation
    +      ((CROSS | joinType) JOIN broadcast=broadcastType? right=relation joinCriteria?
    --- End diff --
    
    I am not a giant fan or this syntax. I'd prefer `LEFT BROADCAST JOIN`, but I am aware that this is also clunky: 'LEFT RIGHT BROADCAST JOIN`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70866390
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -399,7 +402,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
        * separated) relations here, these get converted into a single plan by condition-less inner join.
        */
       override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) {
    -    val from = ctx.relation.asScala.map(plan).reduceLeft(Join(_, _, Inner, None))
    +    val relationPlan = ctx.relation.asScala.map {
    --- End diff --
    
    I don't think this will work with multiple joins. I think we should do this in `visitJoinRelation`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72011145
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    --- End diff --
    
    then should we wrap the whole query plan with `Hint` instead of just the relation? https://github.com/apache/spark/pull/14132#discussion-diff-71820260R343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Oh, the one failure is due to a new MAPJOIN testcase in the master branch (I added yesterday.)
    I'll rebase and update that, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)** for PR 14132 at commit [`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @cloud-fan .
    Now, it wraps the whole query.
    This should be done at your first comment.
    Sorry for delay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72176938
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = resolvedChild.transformDown {
    --- End diff --
    
    yea, for this example, DFS is right and BFS is wrong. But how about the case I mentioned above?
    
    I think we don't need to traverse the plan tree recursively, just find the sub-tree of relation part, flatten the join node if any, and then iterate these relations, wrap it with broadcast hint if it's a table and table name matches the hint argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72178111
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = resolvedChild.transformDown {
    --- End diff --
    
    For your case, what I understand is the DFS should be used. I've search Hive codebase until now, but I couldn't find the clear logic until now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71265020
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    +            val tid = if (names.length > 1) {
    +              TableIdentifier(names(1), Some(names(0)))
    +            } else {
    +              TableIdentifier(param, None)
    +            }
    +            try {
    +              catalog.lookupRelation(tid)
    +
    +              var stop = false
    +              resolvedChild = resolvedChild.transformDown {
    +                case r @ BroadcastHint(SubqueryAlias(t, _))
    +                  if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  r
    +                case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) =>
    +                  stop = true
    +                  BroadcastHint(r)
    --- End diff --
    
    I think we have to remove it; otherwise, the result will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62966/consoleFull)** for PR 14132 at commit [`bd13473`](https://github.com/apache/spark/commit/bd13473e662cc51112c372016e91dbce03d848bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70948608
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -338,7 +338,7 @@ querySpecification
            (RECORDREADER recordReader=STRING)?
            fromClause?
            (WHERE where=booleanExpression)?)
    -    | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
    +    | ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
            | fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
    --- End diff --
    
    Do you want the following? It's already handled.
    ```scala
    scala> sql("FROM t JOIN u ON t.id = u.id SELECT /*+ MAPJOIN(u) */ *").explain
    == Physical Plan ==
    *BroadcastHashJoin [id#0L], [id#4L], Inner, BuildRight
    :- *Range (0, 1000000000, splits=8)
    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
       +- *Range (0, 1000000000, splits=8)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71040527
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    --- End diff --
    
    This is a serious bug, right? : )
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72040281
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,16 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' hintStatement '*/'
    --- End diff --
    
    I might have missed something, but do we only support one hint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70223712
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,14 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' mapJoinHint '*/'
    +    ;
    +
    +mapJoinHint
    +    : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
    --- End diff --
    
    Is the MAPJOIN optimization the only one we are going to support? This is fine if we aren't. We might want to make this more general (and move the logic into the planner) if we are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70226307
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,14 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' mapJoinHint '*/'
    +    ;
    +
    +mapJoinHint
    +    : MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')'
    --- End diff --
    
    Currently, the scope of this PR does. We can generalize if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70225259
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
           case SqlBaseParser.SELECT =>
             // Regular select
     
    +        // Broadcast hints
    +        var withBroadcastedTable = relation
    --- End diff --
    
    A few thoughts here:
    
    - Should we introduce a separate LogicalPlan (like `With`) and move this logic into an Analyzer rule? The reason that I am proposing this is because we are transforming the `LogicalPlan` here. It is simpler to do it here though.
    - Lets move the logic here into a separate method. Name it `withHints`, model it after the other `with*` functions, and use `optionalMap(hint)(withHint)`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72008134
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,49 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   *
    +   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations`
    +   * rule is applied. Here are two reasons.
    +   * - To support `MetastoreRelation` in Hive module.
    +   * - To reduce the effect of `Hint` on the other rules.
    +   *
    +   * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan.
    +   * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    --- End diff --
    
    how does hive handle sql hints? This rule looks a little complex to me, according to the fact that hints can only be applied to table relation. It will be great if we can wrap `UnresolvedRelation` with `Hint` in the parser, and then it will be very easy to resolve hints at analyzer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62806/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71200265
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -87,6 +87,7 @@ class Analyzer(
           EliminateUnions),
         Batch("Resolution", fixedPoint,
           ResolveRelations ::
    +      SubstituteHints ::
    --- End diff --
    
    Yep. Here is the error when we move it from the batch `Resolution` into the batch `Nondeterministic`.
    ```
    unresolved operator 'Project [*];
    org.apache.spark.sql.AnalysisException: unresolved operator 'Project [*];
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I was referring to NormalizeBroadcastHint -- there are many cases in there and it seems error prone against future changes. Do we need all those rules?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Is there a better way to handle sql generation that's not adding 10+ rules?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71259759
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,51 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +
    +          for (param <- parameters) {
    +            val names = param.split("\\.")
    --- End diff --
    
    Sorry, I do not think this is beyond of the scope. Instead, this is a bug. This identifier stores a table identifier. If we do not use the same parsing solution, the result will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Thank you so much for in-depth interview, @gatorsmile .
    I've learned a lot. Yes. It is very late. I took a nap during the evening. :)
    See you tomorrow on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    For your reference, below is a simple case if users want to do it using dataframe
    ```Scala
    sql("CREATE TABLE tab1(c1 int)")
    val df = spark.read.table("tab1")
    df.join(broadcast(df))
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71045978
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHints extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && resolver(t.table, table) =>
    --- End diff --
    
    What happens when users use `databaseName`.`tableName` and `tableName` at the same time?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I agree with @cloud-fan. It is confusing that the hint defined in the top-level query gets applied to a subquery.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    What are Spark temporal views? Temporal tables?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71044638
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    --- End diff --
    
    When adding more and more rules, it becomes harder to track the batch-based dependencies. Then, we have to split the batches. Thus, I think we should do our best to document the prerequisite, dependency assumptions in each rule 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62357/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @rxin and @hvanhovell .
    Could you review this `HINT` PR when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Oh, for the worst case, I hope to skip all. :) I'll investigate them too. Thank you for the pointers!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71464803
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    --- End diff --
    
    Oh, I missed you comment here. It's too far from the bottom now. :)
    I'll add more `prerequisite, dependency assumptions` here now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r72040705
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,16 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' hintStatement '*/'
    --- End diff --
    
    Yes. Currently, it does. Should we expand this, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @hvanhovell .
    In this PR, if possible, could we skip adding `HINT_PREFIX` rule? I finished all the other things.
    Actually, there is no change on the behavior of parser if we can use `HINT_PREFIX`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    I see. That is a temporary table. Temporal Tables are a term for different purposes. 
    
    The sizing is not small, if we want to support the temporary views created by any DataFrame/Dataset. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62500/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @rxin and @gatorsmile .
    Now, it's ready for review again.
    - `SubstituteHints` is minimized.
    - `MetastoreRelation` is supported.
      - This PR originally handles all the tables because it handle `UnresolvedRelation`. While trying to support `database` name, there occurs regression as @gatorsmile mentioned. Now, it's rollbacked. Hive `MetastoreRelation` is not in `catalyst` package. We should do this for `UnresolvedRelation`.
    - All the existing testcases passed. And, Hive specific testsuite is added.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71425497
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -933,4 +933,124 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
           checkSQL("select * from orc_t", "select_orc_table")
         }
       }
    +
    +  test("broadcast hint on single table") {
    +    checkSQL("SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
    +      "broadcast_hint_single_table_1")
    +
    +    checkSQL("SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
    +      "broadcast_hint_single_table_2")
    +
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
    +      "broadcast_hint_single_table_3")
    +
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t0) */ *
    +        |FROM (SELECT id tid FROM parquet_t0) T
    +        |JOIN (SELECT id uid FROM parquet_t0) U
    +        |ON tid=uid
    +      """.stripMargin,
    +      "broadcast_hint_single_4")
    +  }
    +
    +  test("broadcast hint on multiple tables") {
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1",
    +      "broadcast_hint_multiple_table_1")
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t1) */ * FROM parquet_t0, parquet_t1",
    +      "broadcast_hint_multiple_table_2")
    +  }
    +
    +  test("broadcast hint on nested query 1") {
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t0) */ *
    +        |FROM (SELECT id tid FROM parquet_t0) T
    +        |JOIN (SELECT key uid FROM parquet_t1) U
    +        |ON tid=uid
    +      """.stripMargin,
    +      "broadcast_hint_nested_query_1")
    +  }
    +
    +  test("broadcast hint on nested query 2") {
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */ tid
    +        |FROM (SELECT id tid FROM parquet_t0) T
    +        |JOIN (SELECT key uid FROM parquet_t1) U
    +        |ON tid=uid
    +      """.stripMargin,
    +      "broadcast_hint_nested_query_2")
    +  }
    +
    +  test("multiple broadcast hints on multiple tables") {
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0, parquet_t1) */ * FROM parquet_t0, parquet_t1",
    +      "multiple_broadcast_hints")
    +  }
    +
    +  test("broadcast hint with filter") {
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10",
    +      "broadcast_hint_with_filter")
    +  }
    +
    +  test("broadcast hint with filter/limit") {
    +    checkSQL(
    +      "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10 LIMIT 10",
    +      "broadcast_hint_with_filter_limit")
    +  }
    +
    +  test("broadcast hint with generator") {
    +    checkSQL(
    +      "SELECT * FROM (SELECT /*+ MAPJOIN(parquet_t0) */ EXPLODE(ARRAY(1,2,3)) FROM parquet_t0) T",
    +      "broadcast_hint_generator")
    +  }
    +
    +  test("broadcast hint with groupby/having/orderby") {
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t0) */ *
    +        |FROM parquet_t0
    +        |WHERE id > 0
    +        |GROUP BY id
    +        |HAVING count(*) > 0
    +        |ORDER BY id
    +      """.stripMargin,
    +      "broadcast_hint_groupby_having_orderby")
    +  }
    +
    +  test("broadcast hint with window") {
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key)
    +        |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
    +      """.stripMargin,
    +      "broadcast_hint_window")
    +  }
    +
    +  test("broadcast hint with rollup") {
    +    checkSQL(
    +      """
    +        |SELECT /*+ MAPJOIN(parquet_t1) */
    +        |       count(*) as cnt, key%5, grouping_id()
    +        |FROM parquet_t1
    +        |GROUP BY key % 5 WITH ROLLUP""".stripMargin,
    --- End diff --
    
    format


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62578/consoleFull)** for PR 14132 at commit [`9021975`](https://github.com/apache/spark/commit/9021975a2243153edbfa1d4f760f8fcade760513).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Right, there is `table` API, too. Thank you, I'll add that, too.
    By the way, I still in the downtown. I need to go home for dinner. I'll take care that tonight.
    Thank you again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62762/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    for case 2, what does hive(or other databases that support sql hint) do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)** for PR 14132 at commit [`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70938003
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,35 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      case logical: LogicalPlan => logical transformDown {
    +        case h @ Hint(name, parameters, child)
    +            if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) =>
    +          var resolvedChild = child
    +          for (table <- parameters) {
    +            var stop = false
    +            resolvedChild = child.transformDown {
    +              case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
    +              case r @ UnresolvedRelation(t, _) if !stop && t.table == table =>
    +                stop = true
    +                BroadcastHint(r)
    +            }
    +          }
    +          resolvedChild
    +
    +        // Remove unrecognized hint
    +        case Hint(name, _, child) =>
    +          logDebug(s"Ignore Unknown Hint: $name")
    --- End diff --
    
    Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Hi, @rxin .
    Could you review this PR again when you have some time?
    All the features are implemented and Spark Parser accepts general hints now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70882759
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -509,6 +512,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
       }
     
       /**
    +   * Add a Hint to a logical plan.
    +   */
    +  private def withHints(
    +      ctx: HintContext,
    +      relation: LogicalPlan): LogicalPlan = withOrigin(ctx) {
    +    val stmt = ctx.hintStatement
    +    val name = stmt.hintName.getText.toUpperCase
    --- End diff --
    
    No problem. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Yup - and "broadcast".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62162/consoleFull)** for PR 14132 at commit [`edec2e4`](https://github.com/apache/spark/commit/edec2e4ff0648c73f2efc743edca80a705565406).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Hint(name: String, parameters: Seq[String], child: LogicalPlan) extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71822903
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -347,6 +347,16 @@ querySpecification
            windows?)
         ;
     
    +hint
    +    : '/*+' hintStatement '*/'
    +    ;
    +
    +hintStatement
    +    : hintName=identifier
    +    | hintName=identifier '(' parameters+=identifier parameters+=identifier ')'
    --- End diff --
    
    Thank you for review, @cloud-fan .
    The first goal of this PR provides a general syntax for hints, not only broadcast hints. The `(` and `)` syntax is for `INDEX(t idx_emp)` style. You can see the testcase for this in the testcase, too.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14132
  
    **[Test build #62966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62966/consoleFull)** for PR 14132 at commit [`bd13473`](https://github.com/apache/spark/commit/bd13473e662cc51112c372016e91dbce03d848bc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r71444098
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
           }
         }
     
    +    /**
    +     * Merge and move upward to the nearest Project.
    +     * A broadcast hint comment is scattered into multiple nodes inside the plan, and the
    +     * information of BroadcastHint resides its current position inside the plan. In order to
    +     * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into
    +     * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node.
    +     */
    +    object NormalizeBroadcastHint extends Rule[LogicalPlan] {
    +      override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +        // Capture the broadcasted information and store it in Hint.
    +        case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _)))) =>
    +          Hint("BROADCAST", Seq(table), child)
    +
    +        // Nearest Project is found.
    +        case p @ Project(_, Hint(_, _, _)) => p
    +
    +        // Merge BROADCAST hints up to the nearest Project.
    +        case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) =>
    +          h.copy(parameters = params1 ++ params2)
    +        case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) =>
    +          h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right))
    +
    +        // Bubble up BROADCAST hints to the nearest Project.
    +        case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
    +          h.copy(child = j.copy(left = hintChild))
    +        case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
    +          h.copy(child = j.copy(right = hintChild))
    +
    +        // Other UnaryNodes are bypassed.
    +        case u: UnaryNode
    +          if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
    +          val hint = u.child.asInstanceOf[Hint]
    +          hint.copy(child = u.withNewChildren(Seq(hint.child)))
    +
    +        // Other binary operations are ignored.
    --- End diff --
    
    Yep. I worried you say like that. :) I hesitated to use `higher` or `multinary`.
    May I enumerate all of them? At first, I did but remove that comment since it will be outdated.
    Let me think about this more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70892189
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1774,6 +1775,34 @@ class Analyzer(
       }
     
       /**
    +   * Substitute Hints.
    +   */
    +  object SubstituteHint extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    --- End diff --
    
    Nested one works like the design. I'm making testcase to ensure that.
    Here is a example for your query.
    ```scala
    spark.range(10).createOrReplaceTempView("tbl_a")
    spark.range(20).createOrReplaceTempView("tbl_b")
    spark.range(30).createOrReplaceTempView("tbl_c")
    
    val plan = sql(
          """SELECT /*+ MAPJOIN(tbl_a, tbl_a) */
            |       *
            |FROM   tbl_a A
            |       JOIN tbl_b B
            |        ON B.id = A.id
            |       JOIN (SELECT /*+ MAPJOIN(tbl_c) */
            |                         XA.id
            |                  FROM   tbl_b XA
            |                          LEFT SEMI JOIN tbl_c XB
            |                          ON XB.id = XA.id) C
            |        ON C.id = A.id
          """.stripMargin).queryExecution.optimizedPlan
    
    'Join Inner, (id#0L = id#0L)
    :- 'Join Inner, (id#0L = id#0L)
    :  :- BroadcastHint
    :  :  +- Range (0, 10, splits=2)
    :  +- Range (0, 20, splits=2)
    +- 'Join LeftSemi, (id#0L = id#0L)   
       :- Range (0, 20, splits=2)        
       +- BroadcastHint                  
          +- Range (0, 30, splits=2)   
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14132#discussion_r70882932
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -945,8 +955,12 @@ SIMPLE_COMMENT
         : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
         ;
     
    +BRACKETED_EMPTY_COMMENT
    --- End diff --
    
    Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org