You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by nchammas <gi...@git.apache.org> on 2014/07/21 05:36:09 UTC

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

GitHub user nchammas opened a pull request:

    https://github.com/apache/spark/pull/1505

    [SPARK-2470] PEP8 fixes to PySpark

    This pull request aims to resolve all outstanding PEP8 violations in PySpark.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nchammas/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1505
    
----
commit 69da6cff4e54841c884bbb86b84aade1f2d19c67
Author: nchammas <ni...@gmail.com>
Date:   2014-06-10T21:46:44Z

    Merge pull request #1 from apache/master
    
    merging upstream updates

commit 6544b7eb493a5b150692ef01325752145845267e
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-06-10T23:14:32Z

    [SPARK-2065] give launched instances names
    
    This update gives launched EC2 instances descriptive names by using
    instance tags. Launched instances now show up in the EC2 console with
    these names.
    
    I used `format()` with named parameters, which I believe is the
    recommended practice for string formatting in Python, but which doesn’t
    seem to be used elsewhere in the script.

commit 2627247506bb3b5c3d7509081825a0d6c718895e
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-06-11T02:25:28Z

    broke up lines before they hit 100 chars

commit 69f6e222ad4c02ea0fb117d6a70a720ef2d4fa59
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-06-11T03:03:03Z

    PEP8 fixes

commit 89fde082b57db7550fa38a7e5de5c1aa913077d3
Author: nchammas <ni...@gmail.com>
Date:   2014-06-13T21:36:49Z

    Merge pull request #2 from apache/master
    
    pulling upstream updates

commit 2e4fe0074891c10f1f62bd14db6d914181e39ce0
Author: nchammas <ni...@gmail.com>
Date:   2014-06-20T21:54:45Z

    Merge pull request #3 from apache/master
    
    Merge upstream updates

commit de7292a4aa38ec4debf78609977e9c46ee4009c8
Author: nchammas <ni...@gmail.com>
Date:   2014-07-09T14:27:41Z

    Merge pull request #4 from apache/master
    
    merging upstream updates

commit a36eed0b5772f90be9f11fceddf1c6a5618450ff
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-09T14:50:07Z

    name ec2 instances and security groups consistently
    
    Security groups created by spark-ec2 do not prepend “spark-“ to the
    name.
    
    Since naming the instances themselves is new to spark-ec2, it’s better
    to change that pattern to match the existing naming pattern for the
    security groups, rather than the other way around.

commit f7e45813a3a58ba369d5f21595f671c84f5b91ff
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-09T18:26:17Z

    unrelated pep8 fix
    
    Functions in Python should be preceded by 2 blank lines, not 1.

commit 4dd148f76bcc4c6093cc8a075e8b52d3870ebf62
Author: nchammas <ni...@gmail.com>
Date:   2014-07-20T17:53:33Z

    Merge pull request #5 from apache/master
    
    merge upstream updates

commit f0a7ebf8700ca13a43edda2a41921e317ebd8dfc
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T18:43:08Z

    [SPARK-2470] PEP8 fixes to rddsampler.py

commit a6d5e4b049a5bbc3dd1eb46611c535f5cb9dbb8c
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T19:53:48Z

    [SPARK-2470] PEP8 fixes to cloudpickle.py

commit f4e00399663b5e7e641fcfc42747dde38dcf4e76
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T19:53:59Z

    [SPARK-2470] PEP8 fixes to conf.py

commit ca2d28b63776b1901afc255f81b59cd532f39fcb
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T19:54:16Z

    [SPARK-2470] PEP8 fixes to context.py

commit 7fc849cd5bd01d2141b786e362419f59bc039b5a
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T19:54:28Z

    [SPARK-2470] PEP8 fixes to daemon.py

commit 1bde2658a0aea55dad57b3ad2864ddd6fa363f78
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T19:54:42Z

    [SPARK-2470] PEP8 fixes to java_gateway.py

commit 81fcb2016367e4a0a00911ae71bbeede05f49e4e
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:15:53Z

    [SPARK-2470] PEP8 fixes to resultiterable.py

commit d14f2f18587ecb53a8c769a6ab36fa132e8d7361
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:04Z

    [SPARK-2470] PEP8 fixes to __init__.py

commit c85e1e506d291bbba4336419db1d9c1c9b57bb17
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:12Z

    [SPARK-2470] PEP8 fixes to join.py

commit a0fec2e00723e4b79914ef106c4e9745fca88bbc
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:22Z

    [SPARK-2470] PEP8 fixes to mllib

commit 95d1d95b29f2d7a525bac67ffdd1efe671216c62
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:30Z

    [SPARK-2470] PEP8 fixes to serializers.py

commit 19168591a6cd88e0fdaa5bf11c2849bc104a7681
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:38Z

    [SPARK-2470] PEP8 fixes to shell.py

commit aa3a7b64a8bbc23e077378bb1bd8c8bb89b7fae0
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:44Z

    [SPARK-2470] PEP8 fixes to sql.py

commit d644477359cdf6bc1c77343dcfd4a3b818180122
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:51Z

    [SPARK-2470] PEP8 fixes to worker.py

commit b3b96cfcfc98013cc5d95886a8f9664f040443d6
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:16:59Z

    [SPARK-2470] PEP8 fixes to statcounter.py

commit 8f8e4c0ecc9a2b5021a32f867475653b22505bc4
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:17:06Z

    [SPARK-2470] PEP8 fixes to storagelevel.py

commit 7d557b7385fcfcd763209ce4345aa76f02c6f845
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2014-07-20T20:17:14Z

    [SPARK-2470] PEP8 fixes to tests.py

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49694764
  
    Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49671550
  
    @davies PEP 8 recommends using [implied line continuation over backslashes](http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length) where possible. It appears I forgot to add in the required parentheses to make this work in the several places. Apologies! I'll fix that. Or would you rather I used backslashes? 
    
    Anyway, I'm surprised that `sbt/sbt test` didn't catch these problems. Is there some other way to run tests specific to PySpark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49674558
  
    Parentheses or backslashes are welcome. We should just follow the hard rules in PEP8, leave others open to contributors. Too strict rules may make some people feel unhappy :-).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197904
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -55,7 +55,7 @@
     import dis
     import traceback
     
    -#relevant opcodes
    +# relevant opcodes
    --- End diff --
    
    cloudpickle is an third-party module, should we leave it alone?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15200865
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -55,7 +55,7 @@
     import dis
     import traceback
     
    -#relevant opcodes
    +# relevant opcodes
    --- End diff --
    
    It will be awesome if you could do that in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1505


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49689180
  
    Reynold, is there anything else we need to clean up before we can have `pep8` checks become part of the CI cycle?
    
    Also, it sounds like we want to make an exception for cloudpickle. So that needs to be made somewhere when we turn this on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197618
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -252,18 +251,20 @@ def load_stream(self, stream):
                     yield pair
     
         def __eq__(self, other):
    -        return isinstance(other, PairDeserializer) and \
    -               self.key_ser == other.key_ser and self.val_ser == other.val_ser
    +        return isinstance(other, PairDeserializer) and
    +        self.key_ser == other.key_ser and self.val_ser == other.val_ser
    --- End diff --
    
    ??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49685605
  
    Thanks, LGTM.
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197643
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -197,8 +196,8 @@ def _load_stream_without_unbatching(self, stream):
                 return self.serializer.load_stream(stream)
     
         def __eq__(self, other):
    -        return isinstance(other, BatchedSerializer) and \
    -               other.serializer == self.serializer
    +        return isinstance(other, BatchedSerializer) and
    +        other.serializer == self.serializer
    --- End diff --
    
    ??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49694965
  
    QA tests have started for PR 1505. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16941/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49699903
  
    Merging this in master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49569809
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15198830
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -55,7 +55,7 @@
     import dis
     import traceback
     
    -#relevant opcodes
    +# relevant opcodes
    --- End diff --
    
    I'm all for that, but I understand that the goal is to bake an automated `pep8` check into the build cycle. If we're OK with excluding this file from that check, then we can absolutely leave this file alone.
    
    But that raises another question: Will we accept any contributions at all directly to cloudpickle, or is it basically frozen as is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197383
  
    --- Diff: python/pyspark/statcounter.py ---
    @@ -124,5 +125,5 @@ def sampleStdev(self):
             return math.sqrt(self.sampleVariance())
     
         def __repr__(self):
    -        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" % (self.count(), self.mean(), self.stdev(), self.max(), self.min())
    -
    +        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" %
    --- End diff --
    
    Need "\"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49699504
  
    QA results for PR 1505:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16941/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197389
  
    --- Diff: python/pyspark/statcounter.py ---
    @@ -124,5 +125,5 @@ def sampleStdev(self):
             return math.sqrt(self.sampleVariance())
     
         def __repr__(self):
    -        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" % (self.count(), self.mean(), self.stdev(), self.max(), self.min())
    -
    +        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" %
    +        (self.count(), self.mean(), self.stdev(), self.max(), self.min())
    --- End diff --
    
    indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49699932
  
    Now I've merged this, do you mind submitting a pep8 checker? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49698547
  
    If we have to, we could probably somehow package `pep8` and its dependencies as a standalone. It's doable but I think also a bit ugly and harder to update.
    
    As for adding to the Jenkins scripts, shall I take a crack at that in a separate PR? 
    
    How are you thinking we approach that? There are at least a couple of things we could do, like:
    * update `/dev/run-tests` to include `pep8` checks directly
    * update `/dev/run-tests` to call a `pythonstyle` script under `/dev/`
    
    We should probably update step 4 under ["Contributing Code"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode) in the "Contributing to Spark Guide" to cover Python style as well.
    
    Also, would it make sense to replace `sbt/sbt scalastyle` with a new `check-style` script in `/dev/` that checks style for both Scala and Python code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197548
  
    --- Diff: python/pyspark/shell.py ---
    @@ -35,7 +35,8 @@
     from pyspark.storagelevel import StorageLevel
     
     # this is the equivalent of ADD_JARS
    -add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") is not None else None
    +add_files = (
    +    os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") is not None else None)
    --- End diff --
    
    It's better break this line before "if"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49778872
  
    A separate PR and JIRA would be better (since they are for different things). Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49694759
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15200578
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -55,7 +55,7 @@
     import dis
     import traceback
     
    -#relevant opcodes
    +# relevant opcodes
    --- End diff --
    
    We could accept contribution to cloudpickle. If we could reduce the changes in cloudpickle, then we can easily merge updates from upstream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49572014
  
    @davies can you take a look? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197312
  
    --- Diff: python/pyspark/statcounter.py ---
    @@ -124,5 +125,5 @@ def sampleStdev(self):
             return math.sqrt(self.sampleVariance())
     
         def __repr__(self):
    -        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" % (self.count(), self.mean(), self.stdev(), self.max(), self.min())
    -
    +        return "(count: %s, mean: %s, stdev: %s, max: %s, min: %s)" %
    --- End diff --
    
    Here Need "/" and tab


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49698681
  
    Let's definitely add the pep8 check in a separate PR. I'm waiting for Jenkins to come back positive before merging this pull request.
    
    I think it'd make sense to have a linter script for scala (dev/lint-scala), and a linter script (dev/lint-python) for python so it is easier for devs to run. lint-scala can just call scalastyle, while lint-python calls pep8.
    
    And then run-tests should call both. And yes, let's update the wiki once we have this checked in.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197632
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -229,8 +228,8 @@ def load_stream(self, stream):
                     yield pair
     
         def __eq__(self, other):
    -        return isinstance(other, CartesianDeserializer) and \
    -               self.key_ser == other.key_ser and self.val_ser == other.val_ser
    +        return isinstance(other, CartesianDeserializer) and
    +        self.key_ser == other.key_ser and self.val_ser == other.val_ser
    --- End diff --
    
    ??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15197750
  
    --- Diff: python/pyspark/context.py ---
    @@ -192,15 +191,19 @@ def _ensure_initialized(cls, instance=None, gateway=None):
                     SparkContext._writeToFile = SparkContext._jvm.PythonRDD.writeToFile
     
                 if instance:
    -                if SparkContext._active_spark_context and SparkContext._active_spark_context != instance:
    +                if SparkContext._active_spark_context and
    +                SparkContext._active_spark_context != instance:
    --- End diff --
    
    ??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49760182
  
    > do you mind submitting a pep8 checker as part of Jenkins?
    
    Will do. I won't be able to work on this today, but I will open a separate PR for this this week hopefully. Do we need a separate JIRA issue for that btw?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1505#discussion_r15200778
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -55,7 +55,7 @@
     import dis
     import traceback
     
    -#relevant opcodes
    +# relevant opcodes
    --- End diff --
    
    OK, sounds good to me. Shall I revert all changes to cloudpickle then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49791186
  
    I've created [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627) to track the Jenkins/CI part of this work. I'll post future updates there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49678538
  
    OK, I've fixed the problems you pointed out, reverted the changes to cloudpickle, and confirmed that `python/run-tests` passes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1505#issuecomment-49694729
  
    If we have fixed all the problems, let's do it. We should add pep8 check to the Jenkins scripts (in /dev/).
    
    I'm not 100% positive whether our Jenkins instances have pep8 installed. Is there a way to run pep8 if it is not installed in the system by default? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---