You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yhuai <gi...@git.apache.org> on 2015/04/19 06:02:46 UTC

[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/5577

    [WIP][SPARK-6986][CORE]Make SerializationStream/DeserializationStream understand key/value semantic

    https://issues.apache.org/jira/browse/SPARK-6986

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark SPARK-6986

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5577
    
----
commit 6a7432440cabb5b149e1f0ec2a7985fc7c5f1ef4
Author: Yin Huai <yh...@databricks.com>
Date:   2015-04-19T03:59:34Z

    Add writeKey/writeValue and readKey/readValue.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-94241921
  
      [Test build #30544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30544/consoleFull) for   PR 5577 at commit [`6a74324`](https://github.com/apache/spark/commit/6a7432440cabb5b149e1f0ec2a7985fc7c5f1ef4).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-97136092
  
    @sryza Without this change, we still have this issue (when we call `write(key)` and then `write(value)` separately), right? Do you have any suggestion on how to address it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-96422091
  
    Seems good to me - @rxin any comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-94563008
  
    My patch for SPARK-4550 ( #4450 ) makes a similar change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-94241937
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30544/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-94233142
  
      [Test build #30544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30544/consoleFull) for   PR 5577 at commit [`6a74324`](https://github.com/apache/spark/commit/6a7432440cabb5b149e1f0ec2a7985fc7c5f1ef4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-97134710
  
    As it bumps numRecordsWritten both on key and on value, it looks like this change would make us double count the number of records written.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5577#issuecomment-96767455
  
      [Test build #31001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31001/consoleFull) for   PR 5577 at commit [`6a74324`](https://github.com/apache/spark/commit/6a7432440cabb5b149e1f0ec2a7985fc7c5f1ef4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5577#discussion_r28649628
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -300,8 +300,8 @@ private[spark] class ExternalSorter[K, V, C](
             val partitionId = elem._1._1
             val key = elem._1._2
             val value = elem._2
    -        writer.write(key)
    -        writer.write(value)
    +        writer.writeKey(key)
    +        writer.writeValue(value)
    --- End diff --
    
    @rxin An alternative way to make the specialized SQL serializer work in most of the cases is to only change 
    ```
    writer.write(key)
    writer.write(value)
    ```
    to 
    ```
    writer.write((key, value))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org