You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2016/05/26 18:03:29 UTC

[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/13331

    [SPARK-15550][SQL][WIP] Dataset.show() should show contents nested products as rows

    ## What changes were proposed in this pull request?
    
    This PR addresses two related issues:
    
    1. `Dataset.showString()` should show case classes/Java beans at all levels as rows, while master code only handles top level ones.
    
    2. `Dataset.showString()` should show full contents produced the underlying query plan
    
       Dataset is only a view of the underlying query plan. Columns not referred by the encoder are still reachable using methods like `Dataset.col`. So it probably makes more sense to show full contents of the query plan.
    
    (This is still in WIP status because I'd expect multiple test failures from those test cases that depend on output of `Dataset.showString()`.)
    
    ## How was this patch tested?
    
    Two new test cases are added in `DatasetSuite` to check `.showString()` output.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-15550-ds-show

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13331
    
----
commit a1baec5a3adbdfdc6fd5e414c89b05d0c97924a6
Author: Cheng Lian <li...@databricks.com>
Date:   2016-05-26T17:51:04Z

    Dataset.show() should show contents nested products as rows

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64816282
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -436,20 +435,6 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         assert(ds.toString == "[_1: int, _2: int]")
       }
     
    -  test("showString: Kryo encoder") {
    -    implicit val kryoEncoder = Encoders.kryo[KryoData]
    -    val ds = Seq(KryoData(1), KryoData(2)).toDS()
    -
    -    val expectedAnswer = """+-----------+
    -                           ||      value|
    -                           |+-----------+
    -                           ||KryoData(1)|
    -                           ||KryoData(2)|
    -                           |+-----------+
    -                           |""".stripMargin
    -    assert(ds.showString(10) === expectedAnswer)
    -  }
    -
    --- End diff --
    
    Removed this test case since we are not showing objects serialized by Kryo using `toString`. It's just a normal binary column now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221949136
  
    **[Test build #59396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59396/consoleFull)** for PR 13331 at commit [`a1baec5`](https://github.com/apache/spark/commit/a1baec5a3adbdfdc6fd5e414c89b05d0c97924a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13331


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-222007548
  
    **[Test build #59414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59414/consoleFull)** for PR 13331 at commit [`d688034`](https://github.com/apache/spark/commit/d688034b24e5250271556d7ff2ee4cdc91862740).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64975472
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -436,20 +435,6 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         assert(ds.toString == "[_1: int, _2: int]")
       }
     
    -  test("showString: Kryo encoder") {
    -    implicit val kryoEncoder = Encoders.kryo[KryoData]
    -    val ds = Seq(KryoData(1), KryoData(2)).toDS()
    -
    -    val expectedAnswer = """+-----------+
    -                           ||      value|
    -                           |+-----------+
    -                           ||KryoData(1)|
    -                           ||KryoData(2)|
    -                           |+-----------+
    -                           |""".stripMargin
    -    assert(ds.showString(10) === expectedAnswer)
    -  }
    -
    --- End diff --
    
    he didn't actually...
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-222022474
  
    Merging to master and branch-2.0. Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221970260
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221969986
  
    **[Test build #59396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59396/consoleFull)** for PR 13331 at commit [`a1baec5`](https://github.com/apache/spark/commit/a1baec5a3adbdfdc6fd5e414c89b05d0c97924a6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-222007806
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64976581
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -436,20 +435,6 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         assert(ds.toString == "[_1: int, _2: int]")
       }
     
    -  test("showString: Kryo encoder") {
    -    implicit val kryoEncoder = Encoders.kryo[KryoData]
    -    val ds = Seq(KryoData(1), KryoData(2)).toDS()
    -
    -    val expectedAnswer = """+-----------+
    -                           ||      value|
    -                           |+-----------+
    -                           ||KryoData(1)|
    -                           ||KryoData(2)|
    -                           |+-----------+
    -                           |""".stripMargin
    -    assert(ds.showString(10) === expectedAnswer)
    -  }
    -
    --- End diff --
    
    thanks . lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221970262
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59396/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221948972
  
    LGTM, pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-222007809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59414/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64973277
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -436,20 +435,6 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         assert(ds.toString == "[_1: int, _2: int]")
       }
     
    -  test("showString: Kryo encoder") {
    -    implicit val kryoEncoder = Encoders.kryo[KryoData]
    -    val ds = Seq(KryoData(1), KryoData(2)).toDS()
    -
    -    val expectedAnswer = """+-----------+
    -                           ||      value|
    -                           |+-----------+
    -                           ||KryoData(1)|
    -                           ||KryoData(2)|
    -                           |+-----------+
    -                           |""".stripMargin
    -    assert(ds.showString(10) === expectedAnswer)
    -  }
    -
    --- End diff --
    
    Does @rxin know you changed this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64815745
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -440,14 +439,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         implicit val kryoEncoder = Encoders.kryo[KryoData]
         val ds = Seq(KryoData(1), KryoData(2)).toDS()
     
    -    val expectedAnswer = """+-----------+
    --- End diff --
    
    Yea, makes sense. I'm removing it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL][WIP] Dataset.show() should ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13331#discussion_r64792393
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -440,14 +439,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         implicit val kryoEncoder = Encoders.kryo[KryoData]
         val ds = Seq(KryoData(1), KryoData(2)).toDS()
     
    -    val expectedAnswer = """+-----------+
    --- End diff --
    
    I'd like to remove this test entirely as we are not showing the real object content now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15550][SQL] Dataset.show() should show ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13331#issuecomment-221987281
  
    **[Test build #59414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59414/consoleFull)** for PR 13331 at commit [`d688034`](https://github.com/apache/spark/commit/d688034b24e5250271556d7ff2ee4cdc91862740).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org