You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aviv-ebates <gi...@git.apache.org> on 2018/04/12 17:56:32 UTC
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
GitHub user aviv-ebates opened a pull request:
https://github.com/apache/spark/pull/21057
2 Improvements to Pyspark docs
Each of these 2 items has cost me a few hours of debugging. Hopefully, this will stop others from having to debug the same thing.
1. Describe the expected type of `fromOffsets` param.
2. Add missing method to `StreamingListener`, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aviv-ebates/spark improve-docs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21057.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21057
----
commit 4b67ecd0a5e835c35083fe9a1b069f240e8062e1
Author: Aviv <37...@...>
Date: 2018-04-10T17:53:42Z
improve doc1
commit e040a68151c308a1e46cda6723fed2b7023a7331
Author: Aviv <37...@...>
Date: 2018-04-10T17:55:02Z
missing, undocumented, method.
Since this method is missing here, trying to use a Listener throws an expection. Since it's not documented, it's hard to handle.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
It would be helpful if you open a JIRA and describe the issue. It could help other guys think a better way to test or would give clearer ideas to see if it's really difficult to add a test. Usually, JIRA is made first. See also https://spark.apache.org/contributing.html.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
Right, let me try to cherry-pick and see if I can write a test. Will try to have some time and open a PR after cherry-picking your commit. I think you can close this PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181630426
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
def __init__(self):
pass
+
+ def onStreamingStarted(self, streamingStarted):
--- End diff --
this isnt doc only change then, I think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21057
**[Test build #89368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit [`e040a68`](https://github.com/apache/spark/commit/e040a68151c308a1e46cda6723fed2b7023a7331).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
+1 for ^.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21057
I think you may create a minor JIRA ticket for this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on the issue:
https://github.com/apache/spark/pull/21057
If you're not clear about what I've done here, ask away. I don't wish to create a jira account in addition; Feel free to create whatever tickets you think are required.
I've already paid my price for this particular deficiency in Spark documentation, so I don't need this docfix for myself. I was hoping you'd like to improve the docs, so that the next guy would have an easier experience using Spark/Pyspark, but that's totally up to you.
I think I've done all that is reasonable to do for this documents improvements for your project. If you want this change, merge it in; If you don't, don't.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181299329
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
:param topics: list of topic_name to consume.
:param kafkaParams: Additional params for Kafka.
:param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
- point of the stream.
+ point of the stream (Dict with keys of type TopicAndPartition and int values).
--- End diff --
I would say sth like ``a dictionary containing `TopicAndPartition` to integers.``.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21057
I will flight to Korea for a company workshop today. I can do this maybe
only in at tonight. If this isn't urgent, then it is okay.
On Wed, Apr 18, 2018, 10:03 AM Hyukjin Kwon <no...@github.com>
wrote:
> Actually @viirya <https://github.com/viirya>, would you be interested in
> this if you are available? I will do this by myself but I am currently not
> quite available. If you are busy too, let me try it anyway.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21057#issuecomment-382208926>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAEM914hLAgFHKhwMS9OU_PVN7G-WvQ9ks5tppDmgaJpZM4TSO_J>
> .
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181553378
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
:param topics: list of topic_name to consume.
:param kafkaParams: Additional params for Kafka.
:param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
- point of the stream.
+ point of the stream (Dict with keys of type TopicAndPartition and int values).
--- End diff --
Shall we fix as so?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181632575
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
def __init__(self):
pass
+
+ def onStreamingStarted(self, streamingStarted):
--- End diff --
I first thought it's doc only change but I realised it's actually not after taking a close look.
Implementing `onStreamingStarted` looks actually required:
> Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.
This wasn't there at the first - https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py / https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala ; however, this method was added from https://github.com/apache/spark/commit/ce99f51d2e8cbcb1565c9e7a729183bd74a0c2bc but seems Python's implementation is missed.
Strictly it sounds better to have an explicit test since @aviv-ebates has a reproducer (assuming from the description) and should be easy to add.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89368/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21057
**[Test build #89399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89399/testReport)** for PR 21057 at commit [`010de10`](https://github.com/apache/spark/commit/010de10544ca97ff57032d25961ff340aa1384e1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on the issue:
https://github.com/apache/spark/pull/21057
1. Updated title
2. Updated text to one suggested by @HyukjinKwon
3. I don't have a reasonable way to make a test. Test scenario is "Configure pythong listener, run actual spark streaming". If you have a way to test that, that would be great; I don't know how to set up this test.
4. Feel free to create a ticket and update commit messages when you merge this.
5. I consider this a documentation change, because the method is already **available and required**; All I did is make sure it shows up in documentation (and incidentally, make it not-required). Feel free to change commit messages and pr text to reflect your understanding of this change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
( I regret I happened to come over form Korea to Singapore too fast before your flight :-) )
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates closed the pull request at:
https://github.com/apache/spark/pull/21057
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by aviv-ebates <gi...@git.apache.org>.
Github user aviv-ebates commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181460539
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,
:param topics: list of topic_name to consume.
:param kafkaParams: Additional params for Kafka.
:param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting
- point of the stream.
+ point of the stream (Dict with keys of type TopicAndPartition and int values).
--- End diff --
Yeah, that works too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
It's not urgent at all. I would appreciate it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
Actually @viirya, would you be interested in this if you are available? I will do this by myself but I am currently not quite available. If you are busy too, let me try it anyway.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21057#discussion_r181568263
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
def __init__(self):
pass
+
+ def onStreamingStarted(self, streamingStarted):
--- End diff --
Can you add a test to `pyspark.streaming.tests.StreamingListenerTests`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21057
:-)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21057
**[Test build #89399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89399/testReport)** for PR 21057 at commit [`010de10`](https://github.com/apache/spark/commit/010de10544ca97ff57032d25961ff340aa1384e1).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21057
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89399/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21057
BTW, mind fixing the PR title to `[MINOR][PYTHON] ... ` and make the title more descriptive? not a big deal but good to match it with other PRs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21057: 2 Improvements to Pyspark docs
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21057
**[Test build #89368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit [`e040a68`](https://github.com/apache/spark/commit/e040a68151c308a1e46cda6723fed2b7023a7331).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org