You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2015/12/18 06:43:37 UTC

[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/10373

    [SPARK-12398] Smart truncation of DataFrame / Dataset toString

    When a DataFrame or Dataset has a long schema, we should intelligently truncate to avoid flooding the screen with unreadable information.
    // Standard output
    [a: int, b: int]
    
    // Truncate many top level fields
    [a: int, b, string ... 10 more fields]
    
    // Truncate long inner structs
    [a: struct<a: Int ... 10 more fields>]

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark spark-12398

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10373
    
----
commit fc1ea1f80acefd7227f16000fc449e896a26c041
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2015-12-17T22:02:43Z

    [SPARK-12398] Smart truncation of DataFrame / Dataset toString

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48082932
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -1155,4 +1155,54 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         val primitiveUDF = udf((i: Int) => i * 2)
         checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil)
       }
    +
    +  test("SPARK-12398 truncated toString") {
    +    val df1 = Seq((1: Long, "row1": String)).toDF("id", "name")
    +    assert(df1.toString() === "[id: bigint, name: string]")
    +
    +    val df2 = Seq((1: Long, "c2": String, false: Boolean)).toDF("c1", "c2", "c3")
    +    assert(df2.toString === "[c1: bigint, c2: string ... 1 more field]")
    +
    +    val df3 = Seq((1: Long, "c2": String, false: Boolean, 10: Integer)).toDF("c1", "c2", "c3", "c4")
    +    assert(df3.toString === "[c1: bigint, c2: string ... 2 more fields]")
    +
    +    val df4 = Seq((1: Long, Tuple2(1: Long, "val": String))).toDF("c1", "c2")
    +    assert(df4.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string>]")
    +
    +    val df5 = Seq((1: Long, Tuple2(1: Long, "val": String), 20.0: Double)).toDF("c1", "c2", "c3")
    +    assert(df5.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string> ... 1 more field]")
    +
    +    val df6 =
    +      Seq((1: Long,
    +        Tuple2(1: Long, "val": String),
    +        20.0: Double, 1: Integer)).toDF("c1", "c2", "c3", "c4")
    +    assert(df6.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string> ... 2 more fields]")
    --- End diff --
    
    @marmbrus Hi Michael, i noticed that too while i was coding this up. Wasn't sure if i should change this. Now i am :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-165828424
  
    **[Test build #2234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2234/consoleFull)** for PR 10373 at commit [`d617648`](https://github.com/apache/spark/commit/d617648a54ace4d2591cdabb8586f9c33a45c793).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10373


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-165679394
  
    @rxin Hi Reynold, can you please take a look and let me know your comments. Thanks !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48000714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -66,6 +66,9 @@ abstract class DataType extends AbstractDataType {
       /** Readable string representation for the type. */
       def simpleString: String = typeName
     
    +  /** Readable string representation for the type with truncation */
    +  def simpleString(maxNumberFields: Int): String = simpleString
    --- End diff --
    
    this should be private to sql


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-166413378
  
    Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48075452
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
    @@ -278,6 +278,23 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru
         s"struct<${fieldTypes.mkString(",")}>"
       }
     
    +  override def simpleString(maxNumberFields: Int): String = {
    --- End diff --
    
    `private[sql]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-165702633
  
    cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48001060
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -66,6 +66,9 @@ abstract class DataType extends AbstractDataType {
       /** Readable string representation for the type. */
       def simpleString: String = typeName
     
    +  /** Readable string representation for the type with truncation */
    +  def simpleString(maxNumberFields: Int): String = simpleString
    --- End diff --
    
    @rxin Will make the change. Thanks !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-165799559
  
    **[Test build #2234 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2234/consoleFull)** for PR 10373 at commit [`d617648`](https://github.com/apache/spark/commit/d617648a54ace4d2591cdabb8586f9c33a45c793).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48075638
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -1155,4 +1155,54 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         val primitiveUDF = udf((i: Int) => i * 2)
         checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil)
       }
    +
    +  test("SPARK-12398 truncated toString") {
    +    val df1 = Seq((1: Long, "row1": String)).toDF("id", "name")
    +    assert(df1.toString() === "[id: bigint, name: string]")
    +
    +    val df2 = Seq((1: Long, "c2": String, false: Boolean)).toDF("c1", "c2", "c3")
    +    assert(df2.toString === "[c1: bigint, c2: string ... 1 more field]")
    +
    +    val df3 = Seq((1: Long, "c2": String, false: Boolean, 10: Integer)).toDF("c1", "c2", "c3", "c4")
    +    assert(df3.toString === "[c1: bigint, c2: string ... 2 more fields]")
    +
    +    val df4 = Seq((1: Long, Tuple2(1: Long, "val": String))).toDF("c1", "c2")
    +    assert(df4.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string>]")
    +
    +    val df5 = Seq((1: Long, Tuple2(1: Long, "val": String), 20.0: Double)).toDF("c1", "c2", "c3")
    +    assert(df5.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string> ... 1 more field]")
    +
    +    val df6 =
    +      Seq((1: Long,
    +        Tuple2(1: Long, "val": String),
    +        20.0: Double, 1: Integer)).toDF("c1", "c2", "c3", "c4")
    +    assert(df6.toString === "[c1: bigint, c2: struct<_1: bigint,_2: string> ... 2 more fields]")
    --- End diff --
    
    Its a little odd that there are spaces after the top level commas but not the ones inside of struct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10373#issuecomment-165679400
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12398] Smart truncation of DataFrame / ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10373#discussion_r48075542
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -1155,4 +1155,54 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         val primitiveUDF = udf((i: Int) => i * 2)
         checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil)
       }
    +
    +  test("SPARK-12398 truncated toString") {
    +    val df1 = Seq((1: Long, "row1": String)).toDF("id", "name")
    --- End diff --
    
    Nit: you don't really need the type ascriptions here.  You can just do `1L` if you want it to be a long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org