You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/08/08 05:00:13 UTC

[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/1847

    [SPARK-2918] [SQL] [WIP] Support the extended & native command for EXPLAIN

    Currently, EXPLAIN doesn't support the SQL native command, or printing the logical plan. This PR will solve this.
    For examples:
    ```
    spark-sql> explain create table temp__ as select * from src;
    Physical execution plan:InsertIntoHiveTable (MetastoreRelation default, temp__, None), Map(), false
     HiveTableScan [key#6,value#7], (MetastoreRelation default, src, None), None
    
    spark-sql> explain  Extended create table temp__a as select * from src;
    Logical execution plan (Parsed):InsertIntoCreatedTable None, temp__a
     Project [*]
      UnresolvedRelation None, src, None
    
    Logical execution plan (Analyzed):InsertIntoCreatedTable None, temp__a
     Project [key#1,value#2]
      MetastoreRelation default, src, None
    
    Logical execution plan (Optimized):InsertIntoTable Map(), false
     MetastoreRelation default, temp__a, None
     MetastoreRelation default, src, None
    
    Physical execution plan:InsertIntoHiveTable (MetastoreRelation default, temp__a, None), Map(), false
     HiveTableScan [key#1,value#2], (MetastoreRelation default, src, None), None
    
    spark-sql> explain extended select key+3>2+3 as b from src;
    Logical execution plan (Parsed):Project [(('key + 3) > (2 + 3)) AS b#10]
     UnresolvedRelation None, src, None
    
    Logical execution plan (Analyzed):Project [((CAST(key#13, DoubleType) + CAST(3, DoubleType)) > CAST((2 + 3), DoubleType)) AS b#10]
     MetastoreRelation default, src, None
    
    Logical execution plan (Optimized):Project [((CAST(key#13, DoubleType) + 3.0) > 5.0) AS b#10]
     MetastoreRelation default, src, None
    
    Physical execution plan:Project [((CAST(key#13, DoubleType) + 3.0) > 5.0) AS b#10]
     HiveTableScan [key#13], (MetastoreRelation default, src, None), None
    ```
    
    BTW, this PR depends on https://github.com/apache/spark/pull/1846, it will keep failure before #1846 merged.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark explain

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1847
    
----
commit b808d7a6e50924239e65dbaed3f0f548120c820e
Author: Cheng Hao <ha...@intel.com>
Date:   2014-08-08T02:45:02Z

    Support the extended & native command for EXPLAIN

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1847#discussion_r16208144
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -210,9 +210,19 @@ private[hive] object HiveQl {
       def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
     
       /** Returns a LogicalPlan for a given HiveQL string. */
    -  def parseSql(sql: String): LogicalPlan = {
    +  def parseSql(sqlRaw: String): LogicalPlan = {
    +    var sql = sqlRaw
    --- End diff --
    
    I'd avoid using `var`s anywhere that is possible.  It makes it hard to follow flow and breaks scala's functional style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-53239329
  
    @marmbrus can you review #1846 and #1962?  This PR depends on them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1847#discussion_r16207913
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
    @@ -108,15 +108,24 @@ case class SetCommand(
      */
     @DeveloperApi
     case class ExplainCommand(
    -    logicalPlan: LogicalPlan, output: Seq[Attribute])(
    +    logicalPlan: LogicalPlan, output: Seq[Attribute], extended: Boolean)(
         @transient context: SQLContext)
       extends LeafNode with Command {
     
       // Run through the optimizer to generate the physical plan.
       override protected[sql] lazy val sideEffectResult: Seq[String] = try {
    -    "Physical execution plan:" +: context.executePlan(logicalPlan).executedPlan.toString.split("\n")
    +    // TODO in Hive, the "extended" ExplainCommand prints the AST as well, and detailed properties.
    +    val analyzed = context.executePlan(logicalPlan)
    --- End diff --
    
    Nit: `analyzed` seems like a weird name here since its actually the whole `queryExecution`, not just analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-51558449
  
    QA tests have started for PR 1847. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18177/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-53660711
  
    @marmbrus thanks for merging #1962, this PR aims to support showing the `logical plan` and `physical plan` for `native command`, particularly like `explain create table xxx as select xxxx`, hence it depends on #1846, not sure if you have more comment on #1846 . I can make a quick rebase after #1846 being merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-52262347
  
    Thank you @marmbrus for reviewing this. I will update the code as you suggested, but this PR depends on https://github.com/apache/spark/pull/1846, as we don't want the table created during the logical plan generating while run SQL like `explain create table xx as select xxx`, can you take a look the test failure in https://github.com/apache/spark/pull/1846? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-52274317
  
    @marmbrus I've create another PR #1962 which only provide the `extended` support (but doesn't support the `explain CTAS`), hope we can merge that first. I am not sure if we still able to merge the `explain CTAS` in spark 1.1 release, as there is another dependency failed in unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-55560383
  
    I will close this PR, since most of work was done in #1846  & #1962, and native command support for `EXPLAIN` probably not necessary, even Hive doesn't support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1847#discussion_r16208054
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
    @@ -108,15 +108,24 @@ case class SetCommand(
      */
     @DeveloperApi
     case class ExplainCommand(
    -    logicalPlan: LogicalPlan, output: Seq[Attribute])(
    +    logicalPlan: LogicalPlan, output: Seq[Attribute], extended: Boolean)(
         @transient context: SQLContext)
       extends LeafNode with Command {
     
       // Run through the optimizer to generate the physical plan.
       override protected[sql] lazy val sideEffectResult: Seq[String] = try {
    -    "Physical execution plan:" +: context.executePlan(logicalPlan).executedPlan.toString.split("\n")
    +    // TODO in Hive, the "extended" ExplainCommand prints the AST as well, and detailed properties.
    +    val analyzed = context.executePlan(logicalPlan)
    +    var outputString = "Physical execution plan:" + analyzed.executedPlan + "\n"
    --- End diff --
    
    The result would read cleaner if there is a newline in between the heading and the first operator.  That way it wouldn't be on a line all by itself.
    
    What about just using the `toString` of the `queryExecution` object for extended explain?  That is already intended for debugging and that way we don't duplicate the logic here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1847#discussion_r16275727
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -210,9 +210,19 @@ private[hive] object HiveQl {
       def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
     
       /** Returns a LogicalPlan for a given HiveQL string. */
    -  def parseSql(sql: String): LogicalPlan = {
    +  def parseSql(sqlRaw: String): LogicalPlan = {
    +    var sql = sqlRaw
         try {
    -      if (sql.trim.toLowerCase.startsWith("set")) {
    +      // FIXME workaround solution to remove the first comment from the sql string
    +      if (sql.startsWith("--")) {
    --- End diff --
    
    Previously, the `CTAS` (Create Table As Select) will be considered as native command, and `EXPLAIN` only support the `SELECT`, the changes here is to make the `EXPLAIN` supports the native command as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-55381424
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20218/consoleFull) for   PR 1847 at commit [`e52090a`](https://github.com/apache/spark/commit/e52090a9c7aa4a60670adb638c1c14846698c7ce).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-55374087
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20218/consoleFull) for   PR 1847 at commit [`e52090a`](https://github.com/apache/spark/commit/e52090a9c7aa4a60670adb638c1c14846698c7ce).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-51561446
  
    QA results for PR 1847:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class ExplainCommand(plan: LogicalPlan, extended: Boolean = false) extends Command {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18177/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-52115551
  
    Thanks for doing this.  I was thinking about adding this myself this morning!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1847#discussion_r16208197
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -210,9 +210,19 @@ private[hive] object HiveQl {
       def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
     
       /** Returns a LogicalPlan for a given HiveQL string. */
    -  def parseSql(sql: String): LogicalPlan = {
    +  def parseSql(sqlRaw: String): LogicalPlan = {
    +    var sql = sqlRaw
         try {
    -      if (sql.trim.toLowerCase.startsWith("set")) {
    +      // FIXME workaround solution to remove the first comment from the sql string
    +      if (sql.startsWith("--")) {
    --- End diff --
    
    Why are we complicating the logic here by doing our own parsing instead of just using the EXPLAIN tokens as before?  This seems more brittle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel closed the pull request at:

    https://github.com/apache/spark/pull/1847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-53629450
  
    Hi @chenghao-intel, what remains in this PR now that #1962 is merged?  Mind updating the description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1847#issuecomment-52220466
  
    Will you have time to update this / address the comments?  Would be great to include in 1.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org