You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/01/16 07:27:24 UTC

[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/16594

    [SPARK-17078] [SQL] Show stats when explain

    ## What changes were proposed in this pull request?
    
    Currently we can only check the estimated stats in logical plans by debugging. We need to provide an easier and more efficient way for developers/users.
    In this pr, we add an internal conf, when it's true, we can check the stats by explain extended command.
    
    ## How was this patch tested?
    
    Add test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark showStats

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16594
    
----
commit c3489fcad32caa1d6a9b7182e387a46aae5710fa
Author: wangzhenhua <wa...@huawei.com>
Date:   2017-01-16T07:24:23Z

    show stats in explain command

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    :- ) No perfect solution, but we should use the [metric prefix](https://en.wikipedia.org/wiki/Metric_prefix) when the number is huge. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96473964
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    If we do it by default, it can simplify this PR a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73200/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97478978
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +56,32 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Print the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    --- End diff --
    
    We are having [`bytesToString` ](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1109-L1132) in Utils.scala
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71430/testReport)** for PR 16594 at commit [`c3489fc`](https://github.com/apache/spark/commit/c3489fcad32caa1d6a9b7182e387a46aae5710fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    ok I'll modify it with this new command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102410793
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    +      val PB = 1L << 50
    +      if (number < 2 * PB) {
    +        // The number is not very large, so we can use Utils.bytesToString to show it.
    +        Utils.bytesToString(number.toLong)
    +      } else {
    +        // The number is too large, show it in scientific notation.
    +        decimalValue.toString() + " B"
    +      }
    +    } else {
    +      decimalValue.toString()
    --- End diff --
    
    With or without units, the readability is the same, right? If we make them consistent, the impl of `def format(number: BigInt)` will look much cleaner. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    sorry this explain plan makes no sense -- it is impossible to read.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96585594
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala ---
    @@ -27,6 +27,21 @@ import org.apache.spark.sql.test.SQLTestUtils
      */
     class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
     
    +  test("show stats in explain command") {
    +    withSQLConf("spark.sql.statistics.showInExplain" -> "false") {
    +      checkKeywordsNotExist(sql(" explain  select * from src "), "sizeInBytes", "rowCount")
    --- End diff --
    
    thanks, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    me 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71588/testReport)** for PR 16594 at commit [`6af640d`](https://github.com/apache/spark/commit/6af640d81fe3673c65cf318baa595c1952f580ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73402/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96588756
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    `SHOW_TABLE_STATS_IN_EXPLAIN` could be misleading, because we are not only showing stats for table, but also for all logical plans.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    cc @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73419/testReport)** for PR 16594 at commit [`6e10f84`](https://github.com/apache/spark/commit/6e10f840fed50b7e48898e73967bc35a29a6e23b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73200/testReport)** for PR 16594 at commit [`491ec8f`](https://github.com/apache/spark/commit/491ec8f3529bfb552fdae9dcd9c13bc2984f91ce).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102138379
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    --- End diff --
    
    How about improving `bytesToString` and make it support PB or higher? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71906/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71921/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16594


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    ok here is an idea
    
    how about 
    
    ```
    explain stats xxx
    ```
    
    as the way to add stats?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71424/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by ron8hu <gi...@git.apache.org>.
Github user ron8hu commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    To show a very large Long number, there is no need to print out every digit in the number.  We can use exponent.  For example, a number 120,000,000,005,123 can be printed as 1.2*10**14, where 10**14 means 10 to the power 14.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @rxin @gatorsmile @hvanhovell I've updated this pr and make stats much more readable:
    
    SizeInBytes is shown in units of B, KB, MB ... PB, e.g. `sizeInBytes=228.8 GB`, 
    and if it's too large to represent in PB, it's shown in scientific notation, e.g. `sizeInBytes=5.481E+22`. 
    For row count, it doesn't have units, so it's always shown in scientific notation, e.g. `rowCount=7.305E+4`.
    
    Now the above example looks like this:
    ```
    ...
    +- Aggregate [ca_county#1629, d_qoy#1550, d_year#1546], [ca_county#1629, MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#1192)),17,2) AS store_sales#1457]: sizeInBytes=5.481E+22, isBroadcastable=false
       +- Project [ss_ext_sales_price#1192, d_year#1546, d_qoy#1550, ca_county#1629]: sizeInBytes=6.699E+22, isBroadcastable=false
          +- Join Inner, (ss_addr_sk#1183 = ca_address_sk#1622): sizeInBytes=7.917E+22, isBroadcastable=false
             :- Project [ss_addr_sk#1183, ss_ext_sales_price#1192, d_year#1546, d_qoy#1550]: sizeInBytes=3.520 PB, isBroadcastable=false
             :  +- Join Inner, (ss_sold_date_sk#1177 = d_date_sk#1540): sizeInBytes=4.525 PB, isBroadcastable=false
             :     :- Project [ss_sold_date_sk#1177, ss_addr_sk#1183, ss_ext_sales_price#1192]: sizeInBytes=37.11 GB, isBroadcastable=false
             :     :  +- Filter (isnotnull(ss_sold_date_sk#1177) && isnotnull(ss_addr_sk#1183)): sizeInBytes=228.8 GB, isBroadcastable=false
             :     :     +- Relation[ss_sold_date_sk#1177,ss_sold_time_sk#1178,ss_item_sk#1179,ss_customer_sk#1180,ss_cdemo_sk#1181,ss_hdemo_sk#1182,ss_addr_sk#1183,ss_store_sk#1184,ss_promo_sk#1185,ss_ticket_number#1186,ss_quantity#1187,ss_wholesale_cost#1188,ss_list_price#1189,ss_sales_price#1190,ss_ext_discount_amt#1191,ss_ext_sales_price#1192,ss_ext_wholesale_cost#1193,ss_ext_list_price#1194,ss_ext_tax#1195,ss_coupon_amt#1196,ss_net_paid#1197,ss_net_paid_inc_tax#1198,ss_net_profit#1199] parquet: sizeInBytes=228.8 GB, rowCount=5.760E+9, isBroadcastable=false
             :     +- Project [d_date_sk#1540, d_year#1546, d_qoy#1550]: sizeInBytes=124.9 KB, isBroadcastable=false
             :        +- Filter ((((isnotnull(d_date_sk#1540) && isnotnull(d_year#1546)) && isnotnull(d_qoy#1550)) && (d_qoy#1550 = 2)) && (d_year#1546 = 2000)): sizeInBytes=1.805 MB, isBroadcastable=false
             :           +- Relation[d_date_sk#1540,d_date_id#1541,d_date#1542,d_month_seq#1543,d_week_seq#1544,d_quarter_seq#1545,d_year#1546,d_dow#1547,d_moy#1548,d_dom#1549,d_qoy#1550,d_fy_year#1551,d_fy_quarter_seq#1552,d_fy_week_seq#1553,d_day_name#1554,d_quarter_name#1555,d_holiday#1556,d_weekend#1557,d_following_holiday#1558,d_first_dom#1559,d_last_dom#1560,d_same_day_ly#1561,d_same_day_lq#1562,d_current_day#1563,... 4 more fields] parquet: sizeInBytes=1.805 MB, rowCount=7.305E+4, isBroadcastable=false
    ...
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71430/testReport)** for PR 16594 at commit [`c3489fc`](https://github.com/apache/spark/commit/c3489fcad32caa1d6a9b7182e387a46aae5710fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73295/testReport)** for PR 16594 at commit [`b3457a0`](https://github.com/apache/spark/commit/b3457a0ccd2453d9917c6e360bc8b80c10a70c4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96473423
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    Why not doing this by default? Do we need an extra flag?
    
    If needed, the name should be `SHOW_TABLE_STATS_IN_EXPLAIN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71430/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @ron8hu Yes, I've already updated this pr. I'll present an example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @rxin Can we add a flag to enable or disable it? Currently there's no other way to see size and row count except debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    LGTM, pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @gatorsmile I just did a quick fix to show how the improved stats look like. If @rxin @hvanhovell accept the change proposed in this pr, I'll update to remove the flag :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102138925
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    +      val PB = 1L << 50
    +      if (number < 2 * PB) {
    +        // The number is not very large, so we can use Utils.bytesToString to show it.
    +        Utils.bytesToString(number.toLong)
    +      } else {
    +        // The number is too large, show it in scientific notation.
    +        decimalValue.toString() + " B"
    +      }
    +    } else {
    +      decimalValue.toString()
    --- End diff --
    
    https://en.wikipedia.org/wiki/Metric_prefix
    
    Even if we do not have a unit, we still can use K, M, G, T, P, E?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    PostgreSQL has [a few different options in the EXPLAIN command](https://www.postgresql.org/docs/9.3/static/sql-explain.html):
    ```
    EXPLAIN SELECT * FROM foo WHERE i = 4;
    
                             QUERY PLAN
    --------------------------------------------------------------
     Index Scan using fi on foo  (cost=0.00..5.98 rows=1 width=4)
       Index Cond: (i = 4)
    (2 rows)
    ```
    The same plan with cost estimates suppressed:
    ```
    EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4;
    
            QUERY PLAN
    ----------------------------
     Index Scan using fi on foo
       Index Cond: (i = 4)
    (2 rows)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @hvanhovell I've updated the description which shows a simple example.
    
    The explained plan will become hard to read when joining many tables and sizeInBytes is computed by the simple way (non-cbo way), i.e. we just multiply all the sizes of these tables, then sizeInBytes becomes a super large value (could be more than a hundred digits).
    e.g. part of the explained plan of tpcds q31 looks like this (not using cbo):
    ```
    == Optimized Logical Plan ==
    Sort [ca_county#67 ASC NULLS FIRST], true: sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
    +- Project [ca_county#67, d_year#38, CheckOverflow((web_sales#769 / web_sales#6), DecimalType(37,20)) AS web_q1_q2_increase#1, CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) AS store_q1_q2_increase#2, CheckOverflow((web_sales#960 / web_sales#769), DecimalType(37,20)) AS web_q2_q3_increase#3, CheckOverflow((store_sales#578 / store_sales#387), DecimalType(37,20)) AS store_q2_q3_increase#4]: sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
       +- Join Inner, ((ca_county#271 = ca_county#1132) && (CASE WHEN (web_sales#769 > 0.00) THEN CheckOverflow((web_sales#960 / web_sales#769), DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#387 > 0.00) THEN CheckOverflow((store_sales#578 / store_sales#387), DecimalType(37,20)) ELSE null END)): sizeInBytes=288,313,763,753,597,950,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :- Project [ca_county#67, d_year#38, store_sales#5, store_sales#387, store_sales#578, ca_county#271, web_sales#6, web_sales#769]: sizeInBytes=19,387,614,432,995,145,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :  +- Join Inner, ((ca_county#271 = ca_county#941) && (CASE WHEN (web_sales#6 > 0.00) THEN CheckOverflow((web_sales#769 / web_sales#6), DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#5 > 0.00) THEN CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) ELSE null END)): sizeInBytes=23,602,313,222,776,697,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :     :- Join Inner, (ca_county#67 = ca_county#271): sizeInBytes=1,587,133,900,693,866,200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :     :  :- Project [ca_county#67, d_year#38, store_sales#5, store_sales#387, store_sales#578]: sizeInBytes=106,726,573,575,883,570,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :     :  :  +- Join Inner, (ca_county#559 = ca_county#750): sizeInBytes=182,959,840,415,800,400,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :     :  :     :- Join Inner, (ca_county#67 = ca_county#559): sizeInBytes=3,338,025,720,406,215,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false
          :     :  :     :  :- Aggregate [ca_county#67, d_qoy#42, d_year#38], [ca_county#67, d_year#38, MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#24)),17,2) AS store_sales#5]: sizeInBytes=60,900,882,318,058,550,000,000, isBroadcastable=false
          :     :  :     :  :  +- Project [ss_ext_sales_price#24, d_year#38, d_qoy#42, ca_county#67]: sizeInBytes=66,990,970,549,864,410,000,000, isBroadcastable=false
          :     :  :     :  :     +- Join Inner, (ss_addr_sk#15 = ca_address_sk#60): sizeInBytes=79,171,147,013,476,130,000,000, isBroadcastable=false
          :     :  :     :  :        :- Project [ss_addr_sk#15, ss_ext_sales_price#24, d_year#38, d_qoy#42]: sizeInBytes=3,963,069,503,456,967, isBroadcastable=false
          :     :  :     :  :        :  +- Join Inner, (ss_sold_date_sk#9 = d_date_sk#32): sizeInBytes=5,095,375,075,873,244, isBroadcastable=false
          :     :  :     :  :        :     :- Project [ss_sold_date_sk#9, ss_addr_sk#15, ss_ext_sales_price#24]: sizeInBytes=39,847,153,628, isBroadcastable=false
          :     :  :     :  :        :     :  +- Filter (isnotnull(ss_sold_date_sk#9) && isnotnull(ss_addr_sk#15)): sizeInBytes=245,724,114,045, isBroadcastable=false
          :     :  :     :  :        :     :     +- Relation[ss_sold_date_sk#9,ss_sold_time_sk#10,ss_item_sk#11,ss_customer_sk#12,ss_cdemo_sk#13,ss_hdemo_sk#14,ss_addr_sk#15,ss_store_sk#16,ss_promo_sk#17,ss_ticket_number#18,ss_quantity#19,ss_wholesale_cost#20,ss_list_price#21,ss_sales_price#22,ss_ext_discount_amt#23,ss_ext_sales_price#24,ss_ext_wholesale_cost#25,ss_ext_list_price#26,ss_ext_tax#27,ss_coupon_amt#28,ss_net_paid#29,ss_net_paid_inc_tax#30,ss_net_profit#31] parquet: sizeInBytes=245,724,114,045, rowCount=5,759,954,874, isBroadcastable=false
          :     :  :     :  :        :     +- Project [d_date_sk#32, d_year#38, d_qoy#42]: sizeInBytes=127,873, isBroadcastable=false
          :     :  :     :  :        :        +- Filter ((((isnotnull(d_date_sk#32) && isnotnull(d_year#38)) && isnotnull(d_qoy#42)) && (d_qoy#42 = 1)) && (d_year#38 = 2000)): sizeInBytes=1,892,531, isBroadcastable=false
          :     :  :     :  :        :           +- Relation[d_date_sk#32,d_date_id#33,d_date#34,d_month_seq#35,d_week_seq#36,d_quarter_seq#37,d_year#38,d_dow#39,d_moy#40,d_dom#41,d_qoy#42,d_fy_year#43,d_fy_quarter_seq#44,d_fy_week_seq#45,d_day_name#46,d_quarter_name#47,d_holiday#48,d_weekend#49,d_following_holiday#50,d_first_dom#51,d_last_dom#52,d_same_day_ly#53,d_same_day_lq#54,d_current_day#55,... 4 more fields] parquet: sizeInBytes=1,892,531, rowCount=73,049, isBroadcastable=false
          :     :  :     :  :        +- Project [ca_address_sk#60, ca_county#67]: sizeInBytes=19,977,229, isBroadcastable=false
          :     :  :     :  :           +- Filter (isnotnull(ca_county#67) && isnotnull(ca_address_sk#60)): sizeInBytes=149,829,222, isBroadcastable=false
          :     :  :     :  :              +- Relation[ca_address_sk#60,ca_address_id#61,ca_street_number#62,ca_street_name#63,ca_street_type#64,ca_suite_number#65,ca_city#66,ca_county#67,ca_state#68,ca_zip#69,ca_country#70,ca_gmt_offset#71,ca_location_type#72] parquet: sizeInBytes=149,829,222, rowCount=4,550,000, isBroadcastable=false
          :     :  :     :  +- Aggregate [ca_county#559, d_qoy#480, d_year#476], [ca_county#559, MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#24)),17,2) AS store_sales#387]: sizeInBytes=54,810,794,086,252,700,000,000, isBroadcastable=false
          :     :  :     :     +- Project [ss_ext_sales_price#24, d_year#476, d_qoy#480, ca_county#559]: sizeInBytes=66,990,970,549,864,410,000,000, isBroadcastable=false
          :     :  :     :        +- Join Inner, (ss_addr_sk#15 = ca_address_sk#552): sizeInBytes=79,171,147,013,476,130,000,000, isBroadcastable=false
          :     :  :     :           :- Project [ss_addr_sk#15, ss_ext_sales_price#24, d_year#476, d_qoy#480]: sizeInBytes=3,963,069,503,456,967, isBroadcastable=false
          :     :  :     :           :  +- Join Inner, (ss_sold_date_sk#9 = d_date_sk#470): sizeInBytes=5,095,375,075,873,244, isBroadcastable=false
          :     :  :     :           :     :- Project [ss_sold_date_sk#9, ss_addr_sk#15, ss_ext_sales_price#24]: sizeInBytes=39,847,153,628, isBroadcastable=false
          :     :  :     :           :     :  +- Filter (isnotnull(ss_sold_date_sk#9) && isnotnull(ss_addr_sk#15)): sizeInBytes=245,724,114,045, isBroadcastable=false
          :     :  :     :           :     :     +- Relation[ss_sold_date_sk#9,ss_sold_time_sk#10,ss_item_sk#11,ss_customer_sk#12,ss_cdemo_sk#13,ss_hdemo_sk#14,ss_addr_sk#15,ss_store_sk#16,ss_promo_sk#17,ss_ticket_number#18,ss_quantity#19,ss_wholesale_cost#20,ss_list_price#21,ss_sales_price#22,ss_ext_discount_amt#23,ss_ext_sales_price#24,ss_ext_wholesale_cost#25,ss_ext_list_price#26,ss_ext_tax#27,ss_coupon_amt#28,ss_net_paid#29,ss_net_paid_inc_tax#30,ss_net_profit#31] parquet: sizeInBytes=245,724,114,045, rowCount=5,759,954,874, isBroadcastable=false
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73295/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71508/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102409892
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         if (statement == null) {
           null  // This is enough since ParseException will raise later.
         } else if (isExplainableStatement(statement)) {
    -      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null)
    +      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null,
    +        cost = ctx.COST != null)
    --- End diff --
    
    ```
          ExplainCommand(
            statement,
            extended = ctx.EXTENDED != null,
            codegen = ctx.CODEGEN != null,
            cost = ctx.COST != null)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71588/testReport)** for PR 16594 at commit [`6af640d`](https://github.com/apache/spark/commit/6af640d81fe3673c65cf318baa595c1952f580ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102137730
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala ---
    @@ -92,7 +92,8 @@ case class ExecutedCommandExec(cmd: RunnableCommand) extends SparkPlan {
     case class ExplainCommand(
         logicalPlan: LogicalPlan,
         extended: Boolean = false,
    -    codegen: Boolean = false)
    +    codegen: Boolean = false,
    +    cost: Boolean = false)
    --- End diff --
    
    Please add `@parm` like the other parameters 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96589910
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    It's invalidated by default because stats info can be inaccurate (and in some cases very inaccurate), and can confuse regular users. At current stage it's better to be a feature for administrators and developers to see how cbo behaves in estimation. So I make the flag "internal".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73419/testReport)** for PR 16594 at commit [`6e10f84`](https://github.com/apache/spark/commit/6e10f840fed50b7e48898e73967bc35a29a6e23b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73419/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71847/testReport)** for PR 16594 at commit [`ddd5936`](https://github.com/apache/spark/commit/ddd59367c9763213a0de7b3684ee1525f5891639).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102775882
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
     FORMAT: 'FORMAT';
     LOGICAL: 'LOGICAL';
     CODEGEN: 'CODEGEN';
    +COST: 'COST';
    --- End diff --
    
    Yes. Also please update the `hiveNonReservedKeyword` in `TableIdentifierParserSuite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97482084
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +56,32 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Print the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    --- End diff --
    
    I'll try to use that method in combination with current logic, thanks for reminding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    @wzhfy could you add an example of this to the PR description? I am a bit worried that the explain plans will become (much) harder to read. I am also interested to see if this new explain output is understandable for an end user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102137142
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    +      val PB = 1L << 50
    +      if (number < 2 * PB) {
    +        // The number is not very large, so we can use Utils.bytesToString to show it.
    +        Utils.bytesToString(number.toLong)
    +      } else {
    +        // The number is too large, show it in scientific notation.
    +        decimalValue.toString() + " B"
    +      }
    +    } else {
    +      decimalValue.toString()
    --- End diff --
    
    Always represent it using scientific notation? Or only do it when the number is too large?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71508/testReport)** for PR 16594 at commit [`3d66df9`](https://github.com/apache/spark/commit/3d66df96fdd910ac530ce45acaf787bc352ba245).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    I like the idea proposed by rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71847/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    I still do not think using an internal configuration is a user friendly way to show the plan costs. Using this way, we do not want users to see it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102399167
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    --- End diff --
    
    yea, I also think TB is a little small


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97212822
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    Then, when the stats are not accurate, will it be the cause of an inefficient plan? If so, why not showing them the number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102560014
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    +      val PB = 1L << 50
    +      if (number < 2 * PB) {
    +        // The number is not very large, so we can use Utils.bytesToString to show it.
    +        Utils.bytesToString(number.toLong)
    +      } else {
    +        // The number is too large, show it in scientific notation.
    +        decimalValue.toString() + " B"
    +      }
    +    } else {
    +      decimalValue.toString()
    --- End diff --
    
    We can't make them consistent here, because unit string is added inside `Utils.bytesToString`.
    How about move the logic in for size into `Utils.bytesToString` and make it support BigInt?
    Then we can remove `def format`:
    ```
      def simpleString: String = {
        Seq(s"sizeInBytes=${Utils.bytesToString(sizeInBytes)}",
          if (rowCount.isDefined) {
            // Show row count in scientific notation.
            s"rowCount=${BigDecimal(rowCount.get, new MathContext(3, RoundingMode.HALF_UP)).toString()}"
          } else {
            ""
          },
          s"isBroadcastable=$isBroadcastable"
        ).filter(_.nonEmpty).mkString(", ")
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71508/testReport)** for PR 16594 at commit [`3d66df9`](https://github.com/apache/spark/commit/3d66df96fdd910ac530ce45acaf787bc352ba245).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71847/testReport)** for PR 16594 at commit [`ddd5936`](https://github.com/apache/spark/commit/ddd59367c9763213a0de7b3684ee1525f5891639).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    DB2 has a tool to format the contents of the EXPLAIN tables. Below is an example of the output with explanation:
    
    ![screenshot 2017-01-22 21 05 45](https://cloud.githubusercontent.com/assets/11567269/22192191/b054c198-e0e6-11e6-8d64-807c5e196e1b.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97481455
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +56,32 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Print the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    --- End diff --
    
    That method can only accepts Long parameter, and estimated stats can still be unreadable even when using TB as unit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102398658
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         if (statement == null) {
           null  // This is enough since ParseException will raise later.
         } else if (isExplainableStatement(statement)) {
    -      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null)
    +      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null,
    +        cost = ctx.COST != null)
    --- End diff --
    
    Can you give a clue on the style?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    SQLServer has three ways to show the plan: graphical plans, text plans, and XML plans. Actually, it is pretty advanced. When using the text plans, users can set the output formats:
    
    1. SHOWPLAN_ALL \u2013 A reasonably complete set of data showing the estimated execution
    plan for the query.
    2. SHOWPLAN_TEXT \u2013 Provides a very limited set of data for use with tools like osql.exe.
    It, too, only shows the estimated execution plan
    3. STATISTICS PROFILE \u2013 Similar to SHOWPLAN_ALL except it represents the data for
    the actual execution plan.
    
    I found a 300-pages book `SQL Server Execution Plans`. For details, you can [download and read it](http://download.red-gate.com/ebooks/SQL/eBOOK_SQLServerExecutionPlans_2Ed_G_Fritchey.pdf). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102137661
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         if (statement == null) {
           null  // This is enough since ParseException will raise later.
         } else if (isExplainableStatement(statement)) {
    -      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null)
    +      ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = ctx.CODEGEN != null,
    +        cost = ctx.COST != null)
    --- End diff --
    
    Need to fix the style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102887155
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
     FORMAT: 'FORMAT';
     LOGICAL: 'LOGICAL';
     CODEGEN: 'CODEGEN';
    +COST: 'COST';
    --- End diff --
    
    Thanks! Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    As of MySQL 5.7.3, the EXPLAIN statement is changed so that the effect of the EXTENDED keyword is always enabled. 
    ```
    mysql> EXPLAIN EXTENDED
        -> SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: t1
             type: index
    possible_keys: NULL
              key: PRIMARY
          key_len: 4
              ref: NULL
             rows: 4
         filtered: 100.00
            Extra: Using index
    *************************** 2. row ***************************
               id: 2
      select_type: SUBQUERY
            table: t2
             type: index
    possible_keys: a
              key: a
          key_len: 5
              ref: NULL
             rows: 3
         filtered: 100.00
            Extra: Using index
    2 rows in set, 1 warning (0.00 sec)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102647596
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
     FORMAT: 'FORMAT';
     LOGICAL: 'LOGICAL';
     CODEGEN: 'CODEGEN';
    +COST: 'COST';
    --- End diff --
    
    also put in it `nonReserved`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71588/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71906/testReport)** for PR 16594 at commit [`0af8d7f`](https://github.com/apache/spark/commit/0af8d7f410b36547727cb2e6445dccf9d12f2cef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Let us do some research how the other RDBMSs are doing it? For example, Oracle
    ```
    SQL> explain plan for select * from product;
    Explained.
    
    SQL> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------
    Plan hash value: 3917577207
    -----------------------------------------------------
    | Id  | Operation          | Name    | Rows  | Bytes |
    -----------------------------------------------------
    |   0 | SELECT STATEMENT   |         | 15856 |  1254K|
    |   1 |  TABLE ACCESS FULL | PRODUCT | 15856 |  1254K|
    -----------------------------------------------------
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71921/testReport)** for PR 16594 at commit [`bd45854`](https://github.com/apache/spark/commit/bd4585442209334e17b50efd2fdc88328ab78c7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73200/testReport)** for PR 16594 at commit [`491ec8f`](https://github.com/apache/spark/commit/491ec8f3529bfb552fdae9dcd9c13bc2984f91ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102137390
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -197,20 +197,32 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
           """.stripMargin.trim
       }
     
    -  override def toString: String = {
    +  override def toString: String = completeString(appendStats = false)
    +
    +  def toStringWithStats: String = completeString(appendStats = true)
    +
    +  def completeString(appendStats: Boolean): String = {
    --- End diff --
    
    private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r96473148
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala ---
    @@ -27,6 +27,21 @@ import org.apache.spark.sql.test.SQLTestUtils
      */
     class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
     
    +  test("show stats in explain command") {
    +    withSQLConf("spark.sql.statistics.showInExplain" -> "false") {
    +      checkKeywordsNotExist(sql(" explain  select * from src "), "sizeInBytes", "rowCount")
    --- End diff --
    
    A general style suggestion. Normally, the SQL keywords are using upper case in the test cases.
    
    `explain  select * from src` -> `EXPLAIN SELECT * FROM src`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r102398371
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ---
    @@ -54,11 +57,29 @@ case class Statistics(
     
       /** Readable string representation for the Statistics. */
       def simpleString: String = {
    -    Seq(s"sizeInBytes=$sizeInBytes",
    -      if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
    +    Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
    +      if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "",
           s"isBroadcastable=$isBroadcastable"
         ).filter(_.nonEmpty).mkString(", ")
       }
    +
    +  /** Show the given number in a readable format. */
    +  def format(number: BigInt, isSize: Boolean): String = {
    +    val decimalValue = BigDecimal(number, new MathContext(3, RoundingMode.HALF_UP))
    +    if (isSize) {
    +      // The largest unit in Utils.bytesToString is TB
    +      val PB = 1L << 50
    +      if (number < 2 * PB) {
    +        // The number is not very large, so we can use Utils.bytesToString to show it.
    +        Utils.bytesToString(number.toLong)
    +      } else {
    +        // The number is too large, show it in scientific notation.
    +        decimalValue.toString() + " B"
    +      }
    +    } else {
    +      decimalValue.toString()
    --- End diff --
    
    I'm not sure, will that be more readable than scientific notation if no unit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97216973
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    If the `sizeInBytes` affects the plan decision, I think it makes sense to let users see it. 
    
    When the plan is not expected and the number is super large, they might turn on CBO or trigger the command to re-analyze the tables. Hiding it looks not right to me, even if the number is ugly. : )



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73295/testReport)** for PR 16594 at commit [`b3457a0`](https://github.com/apache/spark/commit/b3457a0ccd2453d9917c6e360bc8b80c10a70c4c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71424/testReport)** for PR 16594 at commit [`c3489fc`](https://github.com/apache/spark/commit/c3489fcad32caa1d6a9b7182e387a46aae5710fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #71906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71906/testReport)** for PR 16594 at commit [`0af8d7f`](https://github.com/apache/spark/commit/0af8d7f410b36547727cb2e6445dccf9d12f2cef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    **[Test build #73402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73402/testReport)** for PR 16594 at commit [`6e10f84`](https://github.com/apache/spark/commit/6e10f840fed50b7e48898e73967bc35a29a6e23b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97217477
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    OK. But since it influences user interface, let's double check with others. @rxin @hvanhovell @cloud-fan Shall we show stats of LogicalPlan directly in explain command ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16594#discussion_r97216860
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -649,6 +649,14 @@ object SQLConf {
           .doubleConf
           .createWithDefault(0.05)
     
    +  val SHOW_STATS_IN_EXPLAIN =
    --- End diff --
    
    I'm not sure. e.g., after joins of many tables, if `sizeInBytes` is computed by the simple way (non-cbo way), we just multiply all the sizes of these tables, then `sizeInBytes` becomes a ridiculously large value. I think this will harm user experience.
    I agree removing the flag can simplify code a lot, but I'm hesitated to expose such information to all users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org