You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2015/12/06 19:02:08 UTC

[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/10165

    [SPARK-12164] [SQL] Display the binary/encoded values

    When the dataset is encoded, the existing display looks strange. Decimal format is not common when the type is binary. 
    ```
        implicit val kryoEncoder = Encoders.kryo[KryoClassData]
        val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
        ds.show(20, false);
    ```
    The output is like 
    ```
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |value                                                                                                                                                                                 |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
    |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
    |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    ```
    After the fix, it will be like the below
    ```
    +----------------------------------------------------------------------------------------------------------------------------+
    |value                                                                                                                       |
    +----------------------------------------------------------------------------------------------------------------------------+
    |[01 00 6F 72 67 2E 61 70 61 63 68 65 2E 73 70 61 72 6B 2E 73 71 6C 2E 4B 72 79 6F 43 6C 61 73 73 44 61 74 E1 01 01 82 61 02]|
    |[01 00 6F 72 67 2E 61 70 61 63 68 65 2E 73 70 61 72 6B 2E 73 71 6C 2E 4B 72 79 6F 43 6C 61 73 73 44 61 74 E1 01 01 82 62 04]|
    |[01 00 6F 72 67 2E 61 70 61 63 68 65 2E 73 70 61 72 6B 2E 73 71 6C 2E 4B 72 79 6F 43 6C 61 73 73 44 61 74 E1 01 01 82 63 06]|
    +----------------------------------------------------------------------------------------------------------------------------+
    ```
    
    In addition, do we need to add a new method to decode and then display the contents?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark binaryOutput

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10165
    
----
commit f63c43519b2e8eeab9428397c519de1032e1ae45
Author: gatorsmile <ga...@gmail.com>
Date:   2015-12-05T00:50:03Z

    Merge remote-tracking branch 'upstream/master' into binaryOutput

commit 8754979da599743112f392250cee5606a3ce8864
Author: gatorsmile <ga...@gmail.com>
Date:   2015-12-06T17:44:04Z

    Displays the encoded content of the Dataset

commit 5d0d64c76772d8d8d1a164be130d61e0abb50352
Author: gatorsmile <ga...@gmail.com>
Date:   2015-12-06T17:44:56Z

    Merge remote-tracking branch 'upstream/master' into binaryOutput

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162344227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47244/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162705189
  
    @marmbrus Agree. 
    
    It will truncate if we use the default value. For example, 
    ```scala
    ds.show(20);
    ```
    For showing the decoded values, I can work on it. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/10165


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-163093753
  
    Thank you! @cloud-fan 
    
    Will this PR be merged to 1.6? Or waiting for another PR for showing decoded values? @marmbrus Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162334416
  
    **[Test build #47244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47244/consoleFull)** for PR 10165 at commit [`5d0d64c`](https://github.com/apache/spark/commit/5d0d64c76772d8d8d1a164be130d61e0abb50352).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162344226
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162343924
  
    **[Test build #47244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47244/consoleFull)** for PR 10165 at commit [`5d0d64c`](https://github.com/apache/spark/commit/5d0d64c76772d8d8d1a164be130d61e0abb50352).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162704070
  
    Showing hex for binary columns seems reasonable, though we should probably truncate if its long.  Showing the object representation for oparquely encoded values also seems reasonable but is probably not super easy to implement so I'd put it in a different PR if we are going to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-165272375
  
    Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162400140
  
    I have the exact same question when calling the show function. From the perspectives of users, they might not care the encoded values at all when calling the function `show`. The results of encoded values look weird to most users, I think. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162715388
  
    the truncate logic is already in `DataFrame.showString`: `if (truncate && str.length > 20) str.substring(0, 17) + "..." else str`, so this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-162396682
  
    Shoud we print the decoded values(user objects) in `Dataset.show`? cc @marmbrus @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-164626233
  
    Sure, will do it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10165#issuecomment-164624910
  
    Can you add a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org