You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kanzhang <gi...@git.apache.org> on 2014/05/13 07:37:23 UTC

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

GitHub user kanzhang opened a pull request:

    https://github.com/apache/spark/pull/755

    [SPARK-1161] Add saveAsObjectFile and SparkContext.objectFile in Python

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kanzhang/spark SPARK-1161

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #755
    
----
commit 5a6e9da3ec0a667f1d9fbb7fb0dc3f9b93132329
Author: Kan Zhang <kz...@apache.org>
Date:   2014-05-13T05:33:11Z

    [SPARK-1161] Add saveAsObjectFile and SparkContext.objectFile in Python

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43373144
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43403735
  
    re-submit patch to change a typo in commit log (block size -> batch size).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45015378
  
    Yes, i was about to update it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45027222
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15406/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42920541
  
    Also we might want to call this saveAsPickleFile and pickleFile instead to make clear that it's not compatible with the Java/Scala objectFile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43369709
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45027221
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45105819
  
    @mateiz sure, pls assign it to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44484366
  
    Hey @kanzhang, can you also add some tests for this? The easiest way is to add doctests in `context.py`. Look at how we create temp files in the tests for `SparkContext.union`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45023323
  
    @mateiz made suggested changes and added doc test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44052481
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15171/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45032688
  
    @pwendell thanks. I dropped the sort step in the most recent update after I saw tests on ```keys()``` and ```values()``` didn't call sort (they call ```collect()``` too), and was wondering how the sequential ordering was guaranteed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45023558
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42920487
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44049046
  
    rebased to latest master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42922492
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45034383
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43039785
  
    > How can you make sure that in future versions of PySpark, the same deserializer will be used? 
    
    @mateiz Obviously we can't. I started with forcing the pickle serializer but dismissed it since 1) we can't promise we won't change it down the road and 2) users may want to use a different serializer for object file. So I kind of felt saveAsObjectFile was only suitable for temporary persistence, hence settled on using the default serializer. Now that you suggested naming the methods saveAsPickleFile and pickleFile, it makes forcing pickle serializer much more natural. The user still has to remember which method he/she called to save to file though. Will update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45041788
  
    Looks good, thanks -- going to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45034358
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/755#discussion_r13165029
  
    --- Diff: python/pyspark/context.py ---
    @@ -51,6 +51,7 @@ class SparkContext(object):
         _active_spark_context = None
         _lock = Lock()
         _python_includes = None # zip and egg files that need to be added to PYTHONPATH
    +    _pickle_file_serializer = BatchedSerializer(PickleSerializer(), 1024)
    --- End diff --
    
    You should make batches smaller than 1024 by default, because some objects users work with might be very large. I'd set it to only 10. If you'd like, you can add an optional batchSize parameter to RDD.saveAsPickleFile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43373145
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15052/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44049093
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43131213
  
    I see, I think that with this change, we'd promise to keep the pickle serializer's output the same in future versions, which should be easy. We can always add a new "pickle2" serializer if we find major problems with the current one. So I'd suggest changing this to pickleFile and saying that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43404332
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15067/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43369939
  
    @mateiz changed to forcing the PickleSerializer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42918976
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42920512
  
    How can you make sure that in future versions of PySpark, the same deserializer will be used? You should force this to save pickled objects somehow instead of relying on the RDD's serializer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44049086
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45054683
  
    BTW I forgot to add, but we should also add this to the PySpark programming guide. Opened https://issues.apache.org/jira/browse/SPARK-2013 to track it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44788888
  
    Ah sorry, I missed those. Then it's probably fine, though it would be cool to show one on SparkContext.pickleFile too just as documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42920479
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/755


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44052480
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45038745
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45974068
  
    Digged a little deeper and found the following.
    
    1) ```saveAsPickleFile``` calls ```saveAsObjectFile```, which does its own grouping by a factor of 10, although no serialization is involved here, which is probably fine.
    
    2) ```RDD._reserialize``` currently does nothing if the target serialization differs from the current one only in terms of batch size. This is due to our notion of serializer equality: *"output generated by equal serializers can be deserialized using the same serializer."* It is probably fine for operations like ```union```, so as to avoid unnecessary re-serialization. However, for ```saveAsPickleFile```, it means the actual batch size used may be different from what the user specified (and very likely so since our current default batch size for SparkContext is 1024), which can be confusing. @mateiz , what's your thoughts on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45038749
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15417/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/755#discussion_r13241708
  
    --- Diff: python/pyspark/context.py ---
    @@ -51,6 +51,7 @@ class SparkContext(object):
         _active_spark_context = None
         _lock = Lock()
         _python_includes = None # zip and egg files that need to be added to PYTHONPATH
    +    _pickle_file_serializer = BatchedSerializer(PickleSerializer(), 1024)
    --- End diff --
    
    Ok, will update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43403660
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45023544
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45031538
  
    The tests here are failing because the output of `collect` is not guaranteed to have a sort order. In other pyspark tests they just sort the output.
    
    e.g. https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L254


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-44685162
  
    @mateiz I already have some tests on saveAsPickleFile method, which uses both pickFile and saveAsPickleFile. What more test cases do you have in mind? I could add one that reads a text file, saves it in pickle and then reads it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45042088
  
    Regarding the test on keys() and values(), it might be because the SparkContext was created with batchSize=2, so only one partition there has data. Not 100% sure though. It might be necessary to add sorted() on those too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43369726
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42922493
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14935/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-42920463
  
    Jenkins, this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43404331
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-45015147
  
    Hey sorry @kanzhang , one thing is still missing -- make the default batch size 10 instead of 1024 and add an optional batchSize parameter to saveAsPickleFile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/755#issuecomment-43403661
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---