You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/12 05:56:46 UTC

[GitHub] [spark] itholic opened a new pull request #30346: [SPARK-32085][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

itholic opened a new pull request #30346:
URL: https://github.com/apache/spark/pull/30346


   ### What changes were proposed in this pull request?
   
   This PR proposes to migrate to [NumPy documentation style](https://numpydoc.readthedocs.io/en/latest/format.html), see also [SPARK-33243](https://issues.apache.org/jira/browse/SPARK-33243).
   
   ### Why are the changes needed?
   
   For better documentation as text itself, and generated HTMLs
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, they will see a better format of HTMLs, and better text format. See [SPARK-33243](https://issues.apache.org/jira/browse/SPARK-33243).
   
   
   ### How was this patch tested?
   
   Manually tested via running ./dev/lint-python.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726674503


   **[Test build #131058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131058/testReport)** for PR 30346 at commit [`79f98b7`](https://github.com/apache/spark/commit/79f98b75dd30d67f675aae583380b6da37d75bf8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726613586






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727201643


   **[Test build #131091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131091/testReport)** for PR 30346 at commit [`d9cd6ab`](https://github.com/apache/spark/commit/d9cd6ab1b64189a2ad263b6b4842a3bb264ce37f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522708852



##########
File path: python/pyspark/streaming/context.py
##########
@@ -90,8 +94,12 @@ def getOrCreate(cls, checkpointPath, setupFunc):
         recreated from the checkpoint data. If the data does not exist, then the provided setupFunc
         will be used to create a new context.
 
-        :param checkpointPath: Checkpoint directory used in an earlier streaming program
-        :param setupFunc:      Function to create a new context and setup DStreams
+        Parameters
+        ----------
+        checkpointPath : str
+            Checkpoint directory used in an earlier streaming program
+        setupFunc : funcion

Review comment:
       ```suggestion
           setupFunc : function
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727308891


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35706/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727582707


   **[Test build #131108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131108/testReport)** for PR 30346 at commit [`213d21f`](https://github.com/apache/spark/commit/213d21f44a40037fd2bf9acef33d717d6f013540).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726613586






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522709942



##########
File path: python/pyspark/streaming/context.py
##########
@@ -242,9 +268,14 @@ def socketTextStream(self, hostname, port, storageLevel=StorageLevel.MEMORY_AND_
         a TCP socket and receive byte is interpreted as UTF8 encoded ``\\n`` delimited
         lines.
 
-        :param hostname:      Hostname to connect to for receiving data
-        :param port:          Port to connect to for receiving data
-        :param storageLevel:  Storage level to use for storing the received objects
+        Parameters
+        ----------
+        hostname : str
+            Hostname to connect to for receiving data
+        port : str
+            Port to connect to for receiving data
+        storageLevel : :class:`pyspark.StorageLevel`

Review comment:
       ```suggestion
           storageLevel : :class:`pyspark.StorageLevel`, optional
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726535326


   **[Test build #131037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131037/testReport)** for PR 30346 at commit [`1945dcf`](https://github.com/apache/spark/commit/1945dcfb071c06933694da881de15e54d272c8a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523750158



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,
         initialPositionInStream,
         checkpointInterval,
-        storageLevel: Any = ...,
-        awsAccessKeyId: Optional[Any] = ...,
-        awsSecretKey: Optional[Any] = ...,
-        decoder: Any = ...,
-        stsAssumeRoleArn: Optional[Any] = ...,
-        stsSessionName: Optional[Any] = ...,
-        stsExternalId: Optional[Any] = ...,
+        storageLevel: StorageLevel = ...,
+        awsAccessKeyId: Optional[str] = ...,
+        awsSecretKey: Optional[str] = ...,
+        decoder: Callable[[T], T] = ...,

Review comment:
       This should be
   
   ```python
   Callable[[Optional[bytes]], T] = ...,
   ```
   
   shouldn't it?
   
   And since w already modify the file, maybe 
   
   ```python
   def utf8_decoder(s: Optional[bytes]) -> str: ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726613463


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35647/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727295568


   **[Test build #131103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)** for PR 30346 at commit [`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727209791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725887862


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35576/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523769273



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -18,7 +18,7 @@
 
 # NOTE: This dynamically typed stub was automatically generated by stubgen.

Review comment:
       dropped it, thanks :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727585546


   **[Test build #131108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131108/testReport)** for PR 30346 at commit [`213d21f`](https://github.com/apache/spark/commit/213d21f44a40037fd2bf9acef33d717d6f013540).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726542163


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35643/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727582707


   **[Test build #131108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131108/testReport)** for PR 30346 at commit [`213d21f`](https://github.com/apache/spark/commit/213d21f44a40037fd2bf9acef33d717d6f013540).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726683150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348202






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726659605


   **[Test build #131057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131057/testReport)** for PR 30346 at commit [`5a542b5`](https://github.com/apache/spark/commit/5a542b5b77b6754bac317c097dd0484c1cb86570).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523768800



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,
         initialPositionInStream,
         checkpointInterval,
-        storageLevel: Any = ...,
-        awsAccessKeyId: Optional[Any] = ...,
-        awsSecretKey: Optional[Any] = ...,
-        decoder: Any = ...,
-        stsAssumeRoleArn: Optional[Any] = ...,
-        stsSessionName: Optional[Any] = ...,
-        stsExternalId: Optional[Any] = ...,
+        storageLevel: StorageLevel = ...,
+        awsAccessKeyId: Optional[str] = ...,
+        awsSecretKey: Optional[str] = ...,
+        decoder: Callable[[T], T] = ...,
+        stsAssumeRoleArn: Optional[str] = ...,
+        stsSessionName: Optional[str] = ...,
+        stsExternalId: Optional[str] = ...,
     ): ...

Review comment:
       Added it, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522713401



##########
File path: python/pyspark/streaming/context.py
##########
@@ -242,9 +268,14 @@ def socketTextStream(self, hostname, port, storageLevel=StorageLevel.MEMORY_AND_
         a TCP socket and receive byte is interpreted as UTF8 encoded ``\\n`` delimited
         lines.
 
-        :param hostname:      Hostname to connect to for receiving data
-        :param port:          Port to connect to for receiving data
-        :param storageLevel:  Storage level to use for storing the received objects
+        Parameters
+        ----------
+        hostname : str
+            Hostname to connect to for receiving data
+        port : str

Review comment:
       ```suggestion
           port : int
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348157


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35706/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726683150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726727547






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727594931


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35711/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726727547






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725877267


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35576/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r521984807



##########
File path: python/pyspark/streaming/context.py
##########
@@ -242,9 +268,14 @@ def socketTextStream(self, hostname, port, storageLevel=StorageLevel.MEMORY_AND_
         a TCP socket and receive byte is interpreted as UTF8 encoded ``\\n`` delimited
         lines.
 
-        :param hostname:      Hostname to connect to for receiving data
-        :param port:          Port to connect to for receiving data
-        :param storageLevel:  Storage level to use for storing the received objects
+        Parameters
+        ----------
+        hostname : str
+            Hostname to connect to for receiving data
+        port : str
+            Port to connect to for receiving data
+        storageLevel : :class:`StorageLevel`

Review comment:
       How about
   
       :class:`pyspark.StorageLevel`
   
   so it is properly linked?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522711041



##########
File path: python/pyspark/streaming/context.py
##########
@@ -268,8 +299,12 @@ def binaryRecordsStream(self, directory, recordLength):
         them from another location within the same file system.
         File names starting with . are ignored.
 
-        :param directory:       Directory to load data from
-        :param recordLength:    Length of each record in bytes
+        Parameters
+        ----------
+        directory : str
+            Directory to load data from
+        recordLength : bytes

Review comment:
       ```suggestion
           recordLength : int
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726521325


   **[Test build #131037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131037/testReport)** for PR 30346 at commit [`1945dcf`](https://github.com/apache/spark/commit/1945dcfb071c06933694da881de15e54d272c8a3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523751575



##########
File path: python/pyspark/streaming/kinesis.py
##########
@@ -43,38 +43,59 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName,
         Create an input stream that pulls messages from a Kinesis stream. This uses the
         Kinesis Client Library (KCL) to pull messages from Kinesis.
 
-        .. note:: The given AWS credentials will get saved in DStream checkpoints if checkpointing
-            is enabled. Make sure that your checkpoint directory is secure.
+        Parameters
+        ----------
+        ssc : :class:`StreamingContext`
+            StreamingContext object
+        kinesisAppName : str
+            Kinesis application name used by the Kinesis Client Library (KCL) to
+            update DynamoDB
+        streamName : str
+            Kinesis stream name
+        endpointUrl : str
+            Url of Kinesis service (e.g., https://kinesis.us-east-1.amazonaws.com)
+        regionName : str
+            Name of region used by the Kinesis Client Library (KCL) to update
+            DynamoDB (lease coordination and checkpointing) and CloudWatch (metrics)
+        initialPositionInStream : int
+            In the absence of Kinesis checkpoint info, this is the
+            worker's initial starting position in the stream. The
+            values are either the beginning of the stream per Kinesis'
+            limit of 24 hours (InitialPositionInStream.TRIM_HORIZON) or
+            the tip of the stream (InitialPositionInStream.LATEST).
+        checkpointInterval : int
+            Checkpoint interval for Kinesis checkpointing. See the Kinesis

Review comment:
       Probably not here, but we should note that the value is in seconds.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726663001


   **[Test build #131058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131058/testReport)** for PR 30346 at commit [`79f98b7`](https://github.com/apache/spark/commit/79f98b75dd30d67f675aae583380b6da37d75bf8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727201643


   **[Test build #131091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131091/testReport)** for PR 30346 at commit [`d9cd6ab`](https://github.com/apache/spark/commit/d9cd6ab1b64189a2ad263b6b4842a3bb264ce37f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726555340


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35643/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726659605


   **[Test build #131057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131057/testReport)** for PR 30346 at commit [`5a542b5`](https://github.com/apache/spark/commit/5a542b5b77b6754bac317c097dd0484c1cb86570).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726555330


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726555330






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726674952






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726748562


   @itholic, can you fix corresponding pyi files too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727685110


   Merged to master. Thanks, @itholic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726575432


   @itholic, can you check pyi files too and update? At least I found one diff `batchDuration: Union[float, int] = ...,`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297309






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727585686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726637936


   Thanks for fixing and sorry for bothering, @HyukjinKwon .
   Will double check the all files again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726674952






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348202






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522717815



##########
File path: python/pyspark/streaming/context.py
##########
@@ -286,11 +321,18 @@ def queueStream(self, rdds, oneAtATime=True, default=None):
         Create an input stream from a queue of RDDs or list. In each batch,
         it will process either one or all of the RDDs returned by the queue.
 
-        .. note:: Changes to the queue after the stream is created will not be recognized.
-
-        :param rdds:       Queue of RDDs
-        :param oneAtATime: pick one rdd each time or pick all of them once.
-        :param default:    The default rdd if no more in rdds
+        Parameters
+        ----------
+        rdds : :class:`RDD`, list

Review comment:
       ```suggestion
           rdds : list
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522721724



##########
File path: python/pyspark/streaming/dstream.py
##########
@@ -449,15 +462,21 @@ def reduceByWindow(self, reduceFunc, invReduceFunc, windowDuration, slideDuratio
         2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
         This is more efficient than `invReduceFunc` is None.
 
-        :param reduceFunc:     associative and commutative reduce function
-        :param invReduceFunc:  inverse reduce function of `reduceFunc`; such that for all y,
-                               and invertible x:
-                               `invReduceFunc(reduceFunc(x, y), x) = y`
-        :param windowDuration: width of the window; must be a multiple of this DStream's
-                               batching interval
-        :param slideDuration:  sliding interval of the window (i.e., the interval after which
-                               the new DStream will generate RDDs); must be a multiple of this
-                               DStream's batching interval
+        Parameters
+        ----------
+        reduceFunc : func

Review comment:
       ```suggestion
           reduceFunc : function
   ```

##########
File path: python/pyspark/streaming/dstream.py
##########
@@ -449,15 +462,21 @@ def reduceByWindow(self, reduceFunc, invReduceFunc, windowDuration, slideDuratio
         2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
         This is more efficient than `invReduceFunc` is None.
 
-        :param reduceFunc:     associative and commutative reduce function
-        :param invReduceFunc:  inverse reduce function of `reduceFunc`; such that for all y,
-                               and invertible x:
-                               `invReduceFunc(reduceFunc(x, y), x) = y`
-        :param windowDuration: width of the window; must be a multiple of this DStream's
-                               batching interval
-        :param slideDuration:  sliding interval of the window (i.e., the interval after which
-                               the new DStream will generate RDDs); must be a multiple of this
-                               DStream's batching interval
+        Parameters
+        ----------
+        reduceFunc : func
+            associative and commutative reduce function
+        invReduceFunc : func

Review comment:
       ```suggestion
           invReduceFunc : function
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297266


   **[Test build #131103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)** for PR 30346 at commit [`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523749452



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -18,7 +18,7 @@
 
 # NOTE: This dynamically typed stub was automatically generated by stubgen.

Review comment:
       Nit. Could we drop this, as it is no longer automatically generated?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727594947


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35711/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727594942






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726663001


   **[Test build #131058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131058/testReport)** for PR 30346 at commit [`79f98b7`](https://github.com/apache/spark/commit/79f98b75dd30d67f675aae583380b6da37d75bf8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725980404


   Looks good. In general, If we use `:class:`, `:py:class:` or other linking roles, I'd stick names that are qualified enough, to resolve to the corresponding doc. That's a small thing, but makes navigating docs way more pleasant.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726535588






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726682605


   **[Test build #131057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131057/testReport)** for PR 30346 at commit [`5a542b5`](https://github.com/apache/spark/commit/5a542b5b77b6754bac317c097dd0484c1cb86570).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725887848


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35576/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726535588






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522708521



##########
File path: python/pyspark/streaming/context.py
##########
@@ -36,6 +36,14 @@ class StreamingContext(object):
     be started and stopped using `context.start()` and `context.stop()`,
     respectively. `context.awaitTermination()` allows the current thread
     to wait for the termination of the context by `stop()` or by an exception.
+
+    Parameters
+    ----------
+    sparkContext : :class:`SparkContext`
+        SparkContext object.
+    batchDuration : int, optional
+        the time interval (in seconds) at which streaming
+        data will be divided into batches

Review comment:
       I documented such `jssc` too, and mentioned that it's internal: https://github.com/apache/spark/blob/3959f0d9879fa7fa9e8f2e8ed8c8b12003d21788/python/pyspark/sql/context.py#L50-L53
   Let's keep it consistent for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727590590


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35711/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726584408






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r521983849



##########
File path: python/pyspark/streaming/context.py
##########
@@ -46,9 +46,13 @@ def __init__(self, sparkContext, batchDuration=None, jssc=None):
         """
         Create a new StreamingContext.
 
-        :param sparkContext: :class:`SparkContext` object.
-        :param batchDuration: the time interval (in seconds) at which streaming
-                              data will be divided into batches
+        Parameters
+        ----------
+        sparkContext : :class:`SparkContext`
+            SparkContext object.
+        batchDuration : int, optional
+            the time interval (in seconds) at which streaming
+            data will be divided into batches

Review comment:
       Shall we move this to class doc instead, so it is visible in the rendered docs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727209791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725868654


   It's pretty short, nice. @zero323 mind taking a quick look when you're available?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523750837



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,
         initialPositionInStream,
         checkpointInterval,
-        storageLevel: Any = ...,
-        awsAccessKeyId: Optional[Any] = ...,
-        awsSecretKey: Optional[Any] = ...,
-        decoder: Any = ...,
-        stsAssumeRoleArn: Optional[Any] = ...,
-        stsSessionName: Optional[Any] = ...,
-        stsExternalId: Optional[Any] = ...,
+        storageLevel: StorageLevel = ...,
+        awsAccessKeyId: Optional[str] = ...,
+        awsSecretKey: Optional[str] = ...,
+        decoder: Callable[[T], T] = ...,
+        stsAssumeRoleArn: Optional[str] = ...,
+        stsSessionName: Optional[str] = ...,
+        stsExternalId: Optional[str] = ...,
     ): ...

Review comment:
       The output is missing. I believe it should be `DStream[T]`:
   
   ```python
   ) -> DStream[T]: ...
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522725171



##########
File path: python/pyspark/streaming/context.py
##########
@@ -45,10 +53,6 @@ class StreamingContext(object):
     def __init__(self, sparkContext, batchDuration=None, jssc=None):
         """
         Create a new StreamingContext.
-
-        :param sparkContext: :class:`SparkContext` object.
-        :param batchDuration: the time interval (in seconds) at which streaming
-                              data will be divided into batches
         """

Review comment:
       ```suggestion
   ```

##########
File path: python/pyspark/streaming/context.py
##########
@@ -45,10 +53,6 @@ class StreamingContext(object):
     def __init__(self, sparkContext, batchDuration=None, jssc=None):
         """
         Create a new StreamingContext.

Review comment:
       ```suggestion
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725887860






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522714463



##########
File path: python/pyspark/streaming/context.py
##########
@@ -286,11 +321,18 @@ def queueStream(self, rdds, oneAtATime=True, default=None):
         Create an input stream from a queue of RDDs or list. In each batch,
         it will process either one or all of the RDDs returned by the queue.
 
-        .. note:: Changes to the queue after the stream is created will not be recognized.
-
-        :param rdds:       Queue of RDDs
-        :param oneAtATime: pick one rdd each time or pick all of them once.
-        :param default:    The default rdd if no more in rdds
+        Parameters
+        ----------
+        rdds : :class:`RDD`, list
+            Queue of RDDs
+        oneAtATime : int, optional

Review comment:
       ```suggestion
           oneAtATime : bool, optional
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726584082


   **[Test build #131041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131041/testReport)** for PR 30346 at commit [`4e0c912`](https://github.com/apache/spark/commit/4e0c9126ab5515bb5a4cb42296f51f6aed4fc19a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522720697



##########
File path: python/pyspark/streaming/dstream.py
##########
@@ -423,11 +432,15 @@ def window(self, windowDuration, slideDuration=None):
         Return a new DStream in which each RDD contains all the elements in seen in a
         sliding window of time over this DStream.
 
-        :param windowDuration: width of the window; must be a multiple of this DStream's
-                              batching interval
-        :param slideDuration:  sliding interval of the window (i.e., the interval after which
-                              the new DStream will generate RDDs); must be a multiple of this
-                              DStream's batching interval
+        Parameters
+        ----------
+        windowDuration : int
+            width of the window; must be a multiple of this DStream's
+            batching interval
+        slideDuration : int

Review comment:
       ```suggestion
           slideDuration : int, optional
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522706515



##########
File path: python/pyspark/streaming/context.py
##########
@@ -45,10 +53,6 @@ class StreamingContext(object):
     def __init__(self, sparkContext, batchDuration=None, jssc=None):
         """

Review comment:
       Can you just remove this docstring under `__init__`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725867759


   **[Test build #130970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130970/testReport)** for PR 30346 at commit [`46244fe`](https://github.com/apache/spark/commit/46244fee9451656a35fe35ea36eb8cc6b7ef77a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #30346:
URL: https://github.com/apache/spark/pull/30346


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r521986217



##########
File path: python/pyspark/streaming/kinesis.py
##########
@@ -43,38 +43,59 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName,
         Create an input stream that pulls messages from a Kinesis stream. This uses the
         Kinesis Client Library (KCL) to pull messages from Kinesis.
 
-        .. note:: The given AWS credentials will get saved in DStream checkpoints if checkpointing
-            is enabled. Make sure that your checkpoint directory is secure.
+        Parameters
+        ----------
+        ssc : :class:`StreamingContext`
+            StreamingContext object
+        kinesisAppName : str
+            Kinesis application name used by the Kinesis Client Library (KCL) to
+            update DynamoDB
+        streamName : str
+            Kinesis stream name
+        endpointUrl : str
+            Url of Kinesis service (e.g., https://kinesis.us-east-1.amazonaws.com)
+        regionName : str
+            Name of region used by the Kinesis Client Library (KCL) to update
+            DynamoDB (lease coordination and checkpointing) and CloudWatch (metrics)
+        initialPositionInStream : int
+            In the absence of Kinesis checkpoint info, this is the
+            worker's initial starting position in the stream. The
+            values are either the beginning of the stream per Kinesis'
+            limit of 24 hours (InitialPositionInStream.TRIM_HORIZON) or
+            the tip of the stream (InitialPositionInStream.LATEST).
+        checkpointInterval : int
+            Checkpoint interval for Kinesis checkpointing. See the Kinesis
+            Spark Streaming documentation for more details on the different
+            types of checkpoints.
+        storageLevel : :class:`StorageLevel`, optional

Review comment:
       As above 
   
       :class:`pyspark.StorageLevel`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297309






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727585686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727203683






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725858399


   **[Test build #130970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130970/testReport)** for PR 30346 at commit [`46244fe`](https://github.com/apache/spark/commit/46244fee9451656a35fe35ea36eb8cc6b7ef77a3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727203622


   **[Test build #131091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131091/testReport)** for PR 30346 at commit [`d9cd6ab`](https://github.com/apache/spark/commit/d9cd6ab1b64189a2ad263b6b4842a3bb264ce37f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522655229



##########
File path: python/pyspark/streaming/context.py
##########
@@ -46,9 +46,13 @@ def __init__(self, sparkContext, batchDuration=None, jssc=None):
         """
         Create a new StreamingContext.
 
-        :param sparkContext: :class:`SparkContext` object.
-        :param batchDuration: the time interval (in seconds) at which streaming
-                              data will be divided into batches
+        Parameters
+        ----------
+        sparkContext : :class:`SparkContext`
+            SparkContext object.
+        batchDuration : int, optional
+            the time interval (in seconds) at which streaming
+            data will be divided into batches

Review comment:
       Sounds good. Thanks!!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726584408






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727594942


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726596089


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35647/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523769086



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,
         initialPositionInStream,
         checkpointInterval,
-        storageLevel: Any = ...,
-        awsAccessKeyId: Optional[Any] = ...,
-        awsSecretKey: Optional[Any] = ...,
-        decoder: Any = ...,
-        stsAssumeRoleArn: Optional[Any] = ...,
-        stsSessionName: Optional[Any] = ...,
-        stsExternalId: Optional[Any] = ...,
+        storageLevel: StorageLevel = ...,
+        awsAccessKeyId: Optional[str] = ...,
+        awsSecretKey: Optional[str] = ...,
+        decoder: Callable[[T], T] = ...,

Review comment:
       Absolutely. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522723304



##########
File path: python/pyspark/streaming/dstream.py
##########
@@ -522,17 +551,25 @@ def reduceByKeyAndWindow(self, func, invFunc, windowDuration, slideDuration=None
         `invFunc` can be None, then it will reduce all the RDDs in window, could be slower
         than having `invFunc`.
 
-        :param func:           associative and commutative reduce function
-        :param invFunc:        inverse function of `reduceFunc`
-        :param windowDuration: width of the window; must be a multiple of this DStream's
-                              batching interval
-        :param slideDuration:  sliding interval of the window (i.e., the interval after which
-                              the new DStream will generate RDDs); must be a multiple of this
-                              DStream's batching interval
-        :param numPartitions:  number of partitions of each RDD in the new DStream.
-        :param filterFunc:     function to filter expired key-value pairs;
-                              only pairs that satisfy the function are retained
-                              set this to null if you do not want to filter
+        Parameters
+        ----------
+        func : function
+            associative and commutative reduce function
+        invFunc : function
+            inverse function of `reduceFunc`
+        windowDuration : int
+            width of the window; must be a multiple of this DStream's
+            batching interval
+        slideDuration : int, optional
+            sliding interval of the window (i.e., the interval after which
+            the new DStream will generate RDDs); must be a multiple of this
+            DStream's batching interval
+        numPartitions : int, optional
+            number of partitions of each RDD in the new DStream.
+        filterFunc : func, optional

Review comment:
       ```suggestion
           filterFunc : function, optional
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523768721



##########
File path: python/pyspark/streaming/kinesis.py
##########
@@ -43,38 +43,59 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName,
         Create an input stream that pulls messages from a Kinesis stream. This uses the
         Kinesis Client Library (KCL) to pull messages from Kinesis.
 
-        .. note:: The given AWS credentials will get saved in DStream checkpoints if checkpointing
-            is enabled. Make sure that your checkpoint directory is secure.
+        Parameters
+        ----------
+        ssc : :class:`StreamingContext`
+            StreamingContext object
+        kinesisAppName : str
+            Kinesis application name used by the Kinesis Client Library (KCL) to
+            update DynamoDB
+        streamName : str
+            Kinesis stream name
+        endpointUrl : str
+            Url of Kinesis service (e.g., https://kinesis.us-east-1.amazonaws.com)
+        regionName : str
+            Name of region used by the Kinesis Client Library (KCL) to update
+            DynamoDB (lease coordination and checkpointing) and CloudWatch (metrics)
+        initialPositionInStream : int
+            In the absence of Kinesis checkpoint info, this is the
+            worker's initial starting position in the stream. The
+            values are either the beginning of the stream per Kinesis'
+            limit of 24 hours (InitialPositionInStream.TRIM_HORIZON) or
+            the tip of the stream (InitialPositionInStream.LATEST).
+        checkpointInterval : int
+            Checkpoint interval for Kinesis checkpointing. See the Kinesis

Review comment:
       Add to docs, thanks :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726555307


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35643/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726576439


   **[Test build #131041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131041/testReport)** for PR 30346 at commit [`4e0c912`](https://github.com/apache/spark/commit/4e0c9126ab5515bb5a4cb42296f51f6aed4fc19a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726576439


   **[Test build #131041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131041/testReport)** for PR 30346 at commit [`4e0c912`](https://github.com/apache/spark/commit/4e0c9126ab5515bb5a4cb42296f51f6aed4fc19a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725858399


   **[Test build #130970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130970/testReport)** for PR 30346 at commit [`46244fe`](https://github.com/apache/spark/commit/46244fee9451656a35fe35ea36eb8cc6b7ef77a3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523751933



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,

Review comment:
       Could we annotate all arguments? 
   
   ```python
       def createStream(
           ssc:  pyspark.streaming.context.StreamingContext,
           kinesisAppName: str,
           streamName: str,
           endpointUrl: str,
           regionName: str,
           initialPositionInStream: int,
           checkpointInterval: int,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727295568


   **[Test build #131103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)** for PR 30346 at commit [`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725868043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r522650964



##########
File path: python/pyspark/streaming/kinesis.py
##########
@@ -43,38 +43,59 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName,
         Create an input stream that pulls messages from a Kinesis stream. This uses the
         Kinesis Client Library (KCL) to pull messages from Kinesis.
 
-        .. note:: The given AWS credentials will get saved in DStream checkpoints if checkpointing
-            is enabled. Make sure that your checkpoint directory is secure.
+        Parameters
+        ----------
+        ssc : :class:`StreamingContext`
+            StreamingContext object
+        kinesisAppName : str
+            Kinesis application name used by the Kinesis Client Library (KCL) to
+            update DynamoDB
+        streamName : str
+            Kinesis stream name
+        endpointUrl : str
+            Url of Kinesis service (e.g., https://kinesis.us-east-1.amazonaws.com)
+        regionName : str
+            Name of region used by the Kinesis Client Library (KCL) to update
+            DynamoDB (lease coordination and checkpointing) and CloudWatch (metrics)
+        initialPositionInStream : int
+            In the absence of Kinesis checkpoint info, this is the
+            worker's initial starting position in the stream. The
+            values are either the beginning of the stream per Kinesis'
+            limit of 24 hours (InitialPositionInStream.TRIM_HORIZON) or
+            the tip of the stream (InitialPositionInStream.LATEST).
+        checkpointInterval : int
+            Checkpoint interval for Kinesis checkpointing. See the Kinesis
+            Spark Streaming documentation for more details on the different
+            types of checkpoints.
+        storageLevel : :class:`StorageLevel`, optional

Review comment:
       Cool. Thanks!! 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726521325


   **[Test build #131037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131037/testReport)** for PR 30346 at commit [`1945dcf`](https://github.com/apache/spark/commit/1945dcfb071c06933694da881de15e54d272c8a3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727203683






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725868043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726727525


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35662/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-725887860


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #30346:
URL: https://github.com/apache/spark/pull/30346#discussion_r523768582



##########
File path: python/pyspark/streaming/kinesis.pyi
##########
@@ -32,15 +35,15 @@ class KinesisUtils:
         regionName,

Review comment:
       Sure, thanks :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727209783


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726520497


   > Looks good. In general, If we use `:class:`, `:py:class:` or other linking roles, I'd stick names that are qualified enough, to resolve to the corresponding doc. That's a small thing, but makes navigating docs way more pleasant.
   
   Totally agree! Thanks for the nice review, @zero323 . 😸 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-726715173


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35662/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727207089


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org