You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by AndrewKL <gi...@git.apache.org> on 2018/08/20 21:28:05 UTC

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

GitHub user AndrewKL opened a pull request:

    https://github.com/apache/spark/pull/22162

    [spark-24442][SQL] Added parameters to control the default number of …

    …records shown and the truncate length when a user calls .show() on a dataframe
    
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-24442
    
    ## How was this patch tested?
    
    Unit tests plus local testing.  This change is designed to not modify default behavior unless a user set the following configs...
    
    "spark.sql.show.defaultNumRows" ->default 20 (as it was before)
    
    "spark.sql.show.truncateMaxCharsPerColumn" -> default 30 (as it was before)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AndrewKL/spark spark-24442

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22162
    
----
commit ed5991e53fe404b972c5a0a79febdd0398182cf6
Author: Long <lo...@...>
Date:   2018-08-20T21:19:52Z

    [spark-24442][SQL] Added parameters to control the default number of records shown and the truncate length when a user calls .show() on a dataframe

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216119684
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1950,6 +1962,11 @@ class SQLConf extends Serializable with Logging {
       def parallelFileListingInStatsComputation: Boolean =
         getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)
     
    +  def sqlShowDefaultMaxRecordsToShow: Int = getConf(SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS)
    +
    +  def sqlShowDefaultMaxCharsPerTruncatedRow: Int =
    --- End diff --
    
    `sqlShowTruncateMaxCharsPerColumn`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95254/testReport)** for PR 22162 at commit [`ed5991e`](https://github.com/apache/spark/commit/ed5991e53fe404b972c5a0a79febdd0398182cf6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r215617515
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    --- End diff --
    
    I'll update this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95254/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r215618109
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    --- End diff --
    
    removed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Would someone please take it?
    I have less bandwidth next two days since I will be in a training session at my office.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95775/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95254/testReport)** for PR 22162 at commit [`ed5991e`](https://github.com/apache/spark/commit/ed5991e53fe404b972c5a0a79febdd0398182cf6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216045605
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      val ds = (1 to 1000).toDS().as[Int].show
    --- End diff --
    
    The way show is currently implemented makes it difficult to fully test this.  The truncate and max length parameters are passed into the showString function that would normally be used to test the out put.  unfortunately the parameters and final string is then passed into the println function without an easy way to capture the results.  This could be tested with a spying library but there isn't one in in spark (yet).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95775/testReport)** for PR 22162 at commit [`f93e58e`](https://github.com/apache/spark/commit/f93e58e7471382576840ffc151d57c51b921cb47).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213010672
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    --- End diff --
    
    I'd remove the `()`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Hey Friends,  I have updated the commit and fixed most of the issues mentioned.  I do have one remaining issue left which is the testing of the new show functionality.  Currently the provided private function to get the show string takes in the new parameters given by the config so its impossible to test the impact of the config changes.  There's two possible ways to tackle this.  1) leave the tests as is. 2) add in a spy library that would allow me to inspect the input to the showString function.  This would require adding a library to test this.  I wanted to get feed back on the preferred route before I did this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95775/testReport)** for PR 22162 at commit [`f93e58e`](https://github.com/apache/spark/commit/f93e58e7471382576840ffc151d57c51b921cb47).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    sure, no worries @kiszk, I can take it if needed. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95804/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216119774
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +  test("SPARK-24442 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      (1 to 1000).toDS().as[Int].show
    +    }
    +  }
    +
    +  test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") {
    +    withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") {
    +      (1 to 1000).map(x => "123456789_123456789_123456789_")
    +        .toDS().as[String]
    +        .show
    --- End diff --
    
    Plz compare the show result?
    https://github.com/apache/spark/blob/9241e1e7e66574cfafa68791771959dfc39c9684/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala#L326


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r215617996
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    --- End diff --
    
    will do


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213010807
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    +  }
    +
    +  /**
    +   * Returns the default max characters per column to show before truncation when
    +   * the show function is called with truncate.
    +   * @since 2.3.0
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Hey Friends,
    
    Sorry for the delayed response.  I've been in the desert 5 hours into the middle of no where.  I'll incorporate  the feed back and update this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216119676
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1950,6 +1962,11 @@ class SQLConf extends Serializable with Logging {
       def parallelFileListingInStatsComputation: Boolean =
         getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)
     
    +  def sqlShowDefaultMaxRecordsToShow: Int = getConf(SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS)
    --- End diff --
    
    `sqlShowDefaultMaxRows`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    sure @HyukjinKwon, thanks for pinging me anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212801959
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    --- End diff --
    
    I would add it to `SQLConf`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213010879
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    --- End diff --
    
    not needed as this is private, moreover, it'd be since 2.4.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    ping @AndrewKL 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213158056
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      val ds = (1 to 1000).toDS().as[Int].show
    --- End diff --
    
    I think its ok to check the output number of rows in show.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212832850
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      val ds = (1 to 1000).toDS().as[Int].show
    +    }
    +  }
    +
    +  test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") {
    +    withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") {
    +      val ds = (1 to 1000)
    +        .map(x => "123456789_123456789_123456789_")
    +        .toDS().as[String]
    +        .show
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212832795
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    --- End diff --
    
    nit: Do we remove `git stat2`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213157406
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213158510
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    --- End diff --
    
    How about `spark.sql.defaultNumRowsInShow`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213010706
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    +  }
    +
    +  /**
    +   * Returns the default max characters per column to show before truncation when
    +   * the show function is called with truncate.
    +   * @since 2.3.0
    +   */
    +  private def maxCharactersPerColumnToShow(): Int = {
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216119703
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +  test("SPARK-24442 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    --- End diff --
    
    `SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS.key -> "100"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212832734
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    We should wait @AndrewKL for few days?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    Can you fix the style errors?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95804/testReport)** for PR 22162 at commit [`d85d892`](https://github.com/apache/spark/commit/d85d89200792671b74b0eb71738a5eae58956737).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    ya, sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    @viirya, @kiszk, @mgaido91 and @maropu, would you be interested in taking this over if this gets inactive for few more days?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by AndrewKL <gi...@git.apache.org>.
Github user AndrewKL commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r215617577
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    +  }
    +
    +  /**
    +   * Returns the default max characters per column to show before truncation when
    +   * the show function is called with truncate.
    +   * @since 2.3.0
    --- End diff --
    
    I'll update this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212832804
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
    +  }
    +
    +  /**
    +   * Returns the default max characters per column to show before truncation when
    +   * the show function is called with truncate.
    +   * @since 2.3.0
    +   */
    +  private def maxCharactersPerColumnToShow(): Int = {
    +    this.sparkSession.conf.get("spark.sql.show.truncateMaxCharsPerColumn", "20").toInt
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    I have much bandwidh to take it, too. Is it ok to take it over? @mgaido91 not working on this now? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r212832846
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +
    +  test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      val ds = (1 to 1000).toDS().as[Int].show
    --- End diff --
    
    Do we need to compare the result with a expected string using `assert`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r216119706
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkShowString(ds, expected)
       }
     
    +  test("SPARK-24442 Show should follow spark.show.default.number.of.rows") {
    +    withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
    +      (1 to 1000).toDS().as[Int].show
    +    }
    +  }
    +
    +  test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") {
    +    withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") {
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22162#discussion_r213026874
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -815,6 +815,24 @@ class Dataset[T] private[sql](
         println(showString(numRows, truncate, vertical))
       // scalastyle:on println
     
    +  /**
    +   * Returns the default number of rows to show when the show function is called without
    +   * a user specified max number of rows.
    +   * @since 2.3.0
    +   */
    +  private def numberOfRowsToShow(): Int = {
    --- End diff --
    
    we shouldn't be adding methods here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    @AndrewKL Thanks for your work! I think we can add enough tests for this patch without using a spy library.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22162
  
    **[Test build #95804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95804/testReport)** for PR 22162 at commit [`d85d892`](https://github.com/apache/spark/commit/d85d89200792671b74b0eb71738a5eae58956737).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org