You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aviv-ebates <gi...@git.apache.org> on 2018/04/12 17:56:32 UTC

[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

GitHub user aviv-ebates opened a pull request:

    https://github.com/apache/spark/pull/21057

    2 Improvements to Pyspark docs

    Each of these 2 items has cost me a few hours of debugging. Hopefully, this will stop others from having to debug the same thing.
    
    1. Describe the expected type of `fromOffsets` param.
    2. Add missing method to `StreamingListener`, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aviv-ebates/spark improve-docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21057
    
----
commit 4b67ecd0a5e835c35083fe9a1b069f240e8062e1
Author: Aviv <37...@...>
Date:   2018-04-10T17:53:42Z

    improve doc1

commit e040a68151c308a1e46cda6723fed2b7023a7331
Author: Aviv <37...@...>
Date:   2018-04-10T17:55:02Z

    missing, undocumented, method.
    
    Since this method is missing here, trying to use a Listener throws an expection. Since it's not documented, it's hard to handle.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    It would be helpful if you open a JIRA and describe the issue. It could help other guys think a better way to test or would give clearer ideas to see if it's really difficult to add a test. Usually, JIRA is made first. See also https://spark.apache.org/contributing.html.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Right, let me try to cherry-pick and see if I can write a test. Will try to have some time and open a PR after cherry-picking your commit. I think you can close this PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181630426
  
    --- Diff: python/pyspark/streaming/listener.py ---
    @@ -22,6 +22,10 @@ class StreamingListener(object):
     
         def __init__(self):
             pass
    +    
    +    def onStreamingStarted(self, streamingStarted):
    --- End diff --
    
    this isnt doc only change then, I think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    **[Test build #89368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit [`e040a68`](https://github.com/apache/spark/commit/e040a68151c308a1e46cda6723fed2b7023a7331).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    +1 for ^.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    I think you may create a minor JIRA ticket for this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    If you're not clear about what I've done here, ask away. I don't wish to create a jira account in addition; Feel free to create whatever tickets you think are required.
    I've already paid my price for this particular deficiency in Spark documentation, so I don't need this docfix for myself. I was hoping you'd like to improve the docs, so that the next guy would have an easier experience using Spark/Pyspark, but that's totally up to you.
    I think I've done all that is reasonable to do for this documents improvements for your project. If you want this change, merge it in; If you don't, don't. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181299329
  
    --- Diff: python/pyspark/streaming/kafka.py ---
    @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
             :param topics:  list of topic_name to consume.
             :param kafkaParams: Additional params for Kafka.
             :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
    -                            point of the stream.
    +                            point of the stream (Dict with keys of type TopicAndPartition and int values).
    --- End diff --
    
    I would say sth like ``a dictionary containing `TopicAndPartition` to integers.``.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    I will flight to Korea for a company workshop today. I can do this maybe
    only in at tonight. If this isn't urgent, then it is okay.
    
    On Wed, Apr 18, 2018, 10:03 AM Hyukjin Kwon <no...@github.com>
    wrote:
    
    > Actually @viirya <https://github.com/viirya>, would you be interested in
    > this if you are available? I will do this by myself but I am currently not
    > quite available. If you are busy too, let me try it anyway.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/21057#issuecomment-382208926>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAEM914hLAgFHKhwMS9OU_PVN7G-WvQ9ks5tppDmgaJpZM4TSO_J>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181553378
  
    --- Diff: python/pyspark/streaming/kafka.py ---
    @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
             :param topics:  list of topic_name to consume.
             :param kafkaParams: Additional params for Kafka.
             :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
    -                            point of the stream.
    +                            point of the stream (Dict with keys of type TopicAndPartition and int values).
    --- End diff --
    
    Shall we fix as so?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181632575
  
    --- Diff: python/pyspark/streaming/listener.py ---
    @@ -22,6 +22,10 @@ class StreamingListener(object):
     
         def __init__(self):
             pass
    +    
    +    def onStreamingStarted(self, streamingStarted):
    --- End diff --
    
    I first thought it's doc only change but I realised it's actually not after taking a close look. 
    
    Implementing `onStreamingStarted` looks actually required:
    
    > Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.
    
    This wasn't there at the first - https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py / https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala ; however, this method was added from https://github.com/apache/spark/commit/ce99f51d2e8cbcb1565c9e7a729183bd74a0c2bc but seems Python's implementation is missed.
    
    Strictly it sounds better to have an explicit test since @aviv-ebates has a reproducer (assuming from the description) and should be easy to add.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89368/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    **[Test build #89399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89399/testReport)** for PR 21057 at commit [`010de10`](https://github.com/apache/spark/commit/010de10544ca97ff57032d25961ff340aa1384e1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    1. Updated title
    2. Updated text to one suggested by @HyukjinKwon 
    3. I don't have a reasonable way to make a test. Test scenario is "Configure pythong listener, run actual spark streaming". If you have a way to test that,  that would be great; I don't know how to set up this test.
    4. Feel free to create a ticket and update commit messages when you merge this.
    5. I consider this a documentation change, because the method is already **available and required**; All I did is make sure it shows up in documentation (and incidentally, make it not-required). Feel free to change commit messages and pr text to reflect your understanding of this change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    ( I regret I happened to come over form Korea to Singapore too fast before your flight :-) )


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates closed the pull request at:

    https://github.com/apache/spark/pull/21057


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181460539
  
    --- Diff: python/pyspark/streaming/kafka.py ---
    @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
             :param topics:  list of topic_name to consume.
             :param kafkaParams: Additional params for Kafka.
             :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
    -                            point of the stream.
    +                            point of the stream (Dict with keys of type TopicAndPartition and int values).
    --- End diff --
    
    Yeah, that works too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    It's not urgent at all. I would appreciate it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Actually @viirya, would you be interested in this if you are available? I will do this by myself but I am currently not quite available. If you are busy too, let me try it anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21057#discussion_r181568263
  
    --- Diff: python/pyspark/streaming/listener.py ---
    @@ -22,6 +22,10 @@ class StreamingListener(object):
     
         def __init__(self):
             pass
    +    
    +    def onStreamingStarted(self, streamingStarted):
    --- End diff --
    
    Can you add a test to `pyspark.streaming.tests.StreamingListenerTests`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    **[Test build #89399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89399/testReport)** for PR 21057 at commit [`010de10`](https://github.com/apache/spark/commit/010de10544ca97ff57032d25961ff340aa1384e1).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89399/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    BTW, mind fixing the PR title to `[MINOR][PYTHON] ... ` and make the title more descriptive? not a big deal but good to match it with other PRs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21057
  
    **[Test build #89368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit [`e040a68`](https://github.com/apache/spark/commit/e040a68151c308a1e46cda6723fed2b7023a7331).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org