You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/11/17 12:45:10 UTC

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/19774

    [SPARK-22475][SQL] show histogram in DESC COLUMN command

    ## What changes were proposed in this pull request?
    
    Added the histogram representation to the output of the `DESCRIBE EXTENDED table_name column_name` command.
    
    ## How was this patch tested?
    
    Modified SQL UT and checked output
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-22475

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19774
    
----
commit 24bfcb1132d35ffa8ba2341a7ea9057b14b5ab8a
Author: Marco Gaido <mg...@hortonworks.com>
Date:   2017-11-17T12:42:16Z

    [SPARK-22475][SQL] show histogram in DESC COLUMN command

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151689883
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -24,6 +24,18 @@ DESC EXTENDED desc_col_table key;
     
     DESC FORMATTED desc_col_table key;
     
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values(1);
    +INSERT INTO desc_col_table values(2);
    +INSERT INTO desc_col_table values(3);
    +INSERT INTO desc_col_table values(4);
    --- End diff --
    
    INSERT INTO desc_col_table values 1, 2, 3, 4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83972/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84075/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152157864
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    +      }).getOrElse(Seq(Row("histogram", "NULL")))
    --- End diff --
    
    What is the output of Hive?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152265034
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    @henryr I think verbosity is a minor issue, since we are returning a `DataFrame` and the user can decide what to show.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #83966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83966/testReport)** for PR 19774 at commit [`24bfcb1`](https://github.com/apache/spark/commit/24bfcb1132d35ffa8ba2341a7ea9057b14b5ab8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152004929
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    --- End diff --
    
    yea I think for this case the for loop is conventional


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151741011
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    --- End diff --
    
    Thanks for your feedback. IMHO I am not sure that it would be easier to read. But if someone else agrees with you I can change it, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #83972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83972/testReport)** for PR 19774 at commit [`9bfa80c`](https://github.com/apache/spark/commit/9bfa80cca04a3b00e0fc2b02beb45c56f2058a34).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151826928
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    add a comment that the empty column is for indentation and better readability?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151892351
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    @mgaido91 @jaceklaskowski How about using "histogram_bin" instead of the empty string?
    ```
    histogram	height: 2.0, num_of_bins: 2
    histogram_bin	lower_bound: 1.0, upper_bound: 2.0, distinct_count: 2
    histogram_bin	lower_bound: 2.0, upper_bound: 4.0, distinct_count: 2
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by jaceklaskowski <gi...@git.apache.org>.

Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151838625
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    @wzhfy I'd rather define a `val` with the comment being the name of the val. That would make it "compile-safe".


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152005780
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -24,6 +24,19 @@ DESC EXTENDED desc_col_table key;
     
     DESC FORMATTED desc_col_table key;
     
    +-- Test output for histogram statistics
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values 1, 2, 3, 4;
    +
    +ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
    +
    +DESC EXTENDED desc_col_table key;
    --- End diff --
    
    shall we move these to the end of this file?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151892566
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    And how about making each class member as a separate field?
    ```
    histogram	height: 2.0	num_of_bins: 2
    histogram_bin	lower_bound: 1.0	upper_bound: 2.0	distinct_count: 2
    histogram_bin	lower_bound: 2.0	upper_bound: 4.0	distinct_count: 2
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152005513
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    +      }).getOrElse(Seq(Row("histogram", "NULL")))
    --- End diff --
    
    If no histogram info, I'd like to print nothing here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152299177
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -34,6 +34,19 @@ DESC FORMATTED desc_complex_col_table col;
     -- Describe a nested column
     DESC FORMATTED desc_complex_col_table col.x;
     
    +-- Test output for histogram statistics
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values 1, 2, 3, 4;
    +
    +ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
    +
    +DESC EXTENDED desc_col_table key;
    +
    +SET spark.sql.statistics.histogram.enabled=false;
    +SET spark.sql.statistics.histogram.numBins=256;
    --- End diff --
    
    You can also argue that this increase the maintenance cost as we need to change it if we change the default value of these configs. I think the general principle is, set the config if the following queries depend on it. At the end of test, there is no following queries and we don't need to do anything to the configs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by henryr <gi...@git.apache.org>.

Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152096698
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    Do you think this (printing one line per bucket) might become too verbose if there are many buckets? In an interactive session writing 256 buckets, one-per-line is likely to push the non-histogram output off the top of the screen. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #83966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83966/testReport)** for PR 19774 at commit [`24bfcb1`](https://github.com/apache/spark/commit/24bfcb1132d35ffa8ba2341a7ea9057b14b5ab8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #84072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84072/testReport)** for PR 19774 at commit [`0abc951`](https://github.com/apache/spark/commit/0abc951042246fc18c678a255e95345c4476fd79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19774


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152264097
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    @gatorsmile In Hive, there isn't yet a histogram implementation (HIVE-3526). In Oracle and MySQL, the information is stored in metadata tables which can be queried (https://docs.oracle.com/cloud/latest/db112/REFRN/statviews_2106.htm#REFRN20279) and here we have mainly two information:
     - the cumulative count so far;
     - the endpoint value for the current bin.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84072/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152158010
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    What is the output contents/formats of the other vendors?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83966/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152265613
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    +      }).getOrElse(Seq(Row("histogram", "NULL")))
    --- End diff --
    
    Hive has no histogram


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152098557
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    kind of, but I don't have a better idea, adding a new flag to `DESC COLUMNS` seems an overkill.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151693461
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin => Row("",
    --- End diff --
    
    nit: `hist.bins.map { bin =>`
    And is it possible to move `Row("",` to the next line?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152295430
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -34,6 +34,19 @@ DESC FORMATTED desc_complex_col_table col;
     -- Describe a nested column
     DESC FORMATTED desc_complex_col_table col.x;
     
    +-- Test output for histogram statistics
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values 1, 2, 3, 4;
    +
    +ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
    +
    +DESC EXTENDED desc_col_table key;
    +
    +SET spark.sql.statistics.histogram.enabled=false;
    +SET spark.sql.statistics.histogram.numBins=256;
    --- End diff --
    
    isn't it better to reset them to prevent possible future tests eventually added after these lines to be influenced? If no, I can remove them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151855965
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    +      }).getOrElse(Seq(Row("histogram", "NULL")))
    --- End diff --
    
    Some comments or cleanup here would be nicer though


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #84072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84072/testReport)** for PR 19774 at commit [`0abc951`](https://github.com/apache/spark/commit/0abc951042246fc18c678a255e95345c4476fd79).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #84075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84075/testReport)** for PR 19774 at commit [`38ef638`](https://github.com/apache/spark/commit/38ef638a333e7ba9e3ab91204f072244f4816b89).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152005342
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    maybe adding the bin id?
    ```
    histogram	height: 2.0, num_of_bins: 2
    bin0      lower_bound: 1.0, upper_bound: 2.0, distinct_count: 2
    bin1      lower_bound: 2.0, upper_bound: 4.0, distinct_count: 2
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #83972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83972/testReport)** for PR 19774 at commit [`9bfa80c`](https://github.com/apache/spark/commit/9bfa80cca04a3b00e0fc2b02beb45c56f2058a34).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by henryr <gi...@git.apache.org>.

Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152099227
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    yeah, I was wondering if we actually need to see every individual bucket instead of a summary (max bucket: 2.0->4.0 w/128 entries, min bucket: 6.0->8.0 w/0 entries, median bucket size: 32).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152100465
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    +        val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}")
    +        Seq(header) ++ hist.bins.map(bin =>
    +          Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}"))
    --- End diff --
    
    for debug purpose, I'd like to have as more information as possible. In the future maybe we can simplify the `DESC COLUMN` and use `spark.catalog.getColumn` to get detailed information.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r152294075
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -34,6 +34,19 @@ DESC FORMATTED desc_complex_col_table col;
     -- Describe a nested column
     DESC FORMATTED desc_complex_col_table col.x;
     
    +-- Test output for histogram statistics
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values 1, 2, 3, 4;
    +
    +ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
    +
    +DESC EXTENDED desc_col_table key;
    +
    +SET spark.sql.statistics.histogram.enabled=false;
    +SET spark.sql.statistics.histogram.numBins=256;
    --- End diff --
    
    We don't need to reset the config, the test framework will reset the entire session after each test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    **[Test build #84075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84075/testReport)** for PR 19774 at commit [`38ef638`](https://github.com/apache/spark/commit/38ef638a333e7ba9e3ab91204f072244f4816b89).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by jaceklaskowski <gi...@git.apache.org>.

Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151737674
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -689,6 +689,11 @@ case class DescribeColumnCommand(
           buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL"))
           buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL"))
           buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL"))
    +      buffer ++= cs.flatMap(_.histogram.map { hist =>
    --- End diff --
    
    I'm pretty sure that for-comprehension would make the code read easier.
    
    ```scala
    for {
      c <- cs
      hist <- c.histogram
      ...
    } yield ...
    ```
    
    Let me know if you need help with that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19774#discussion_r151693478
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
    @@ -24,6 +24,18 @@ DESC EXTENDED desc_col_table key;
     
     DESC FORMATTED desc_col_table key;
     
    +SET spark.sql.statistics.histogram.enabled=true;
    +SET spark.sql.statistics.histogram.numBins=2;
    +
    +INSERT INTO desc_col_table values(1);
    +INSERT INTO desc_col_table values(2);
    +INSERT INTO desc_col_table values(3);
    +INSERT INTO desc_col_table values(4);
    +
    +ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
    --- End diff --
    
    please set the sql conf back to default value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/19774
  
    seems good to me. ping @gatorsmile @cloud-fan @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org