You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/19 12:26:55 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

HyukjinKwon opened a new pull request, #37582:
URL: https://github.com/apache/spark/pull/37582

   ### What changes were proposed in this pull request?
   
   This PR proposes to improve the examples in `pyspark.sql.session` by making each example self-contained with a brief explanation and a bit more realistic example.
   
   ### Why are the changes needed?
   
   To make the documentation more readable and able to copy and paste directly in PySpark shell.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it changes the documentation
   
   ### How was this patch tested?
   
   Manually ran each doctests. CI also runs this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37582:
URL: https://github.com/apache/spark/pull/37582#discussion_r950764394


##########
python/pyspark/sql/session.py:
##########
@@ -99,8 +99,15 @@ def toDF(self, schema=None, sampleRatio=None):
 
         Examples
         --------
-        >>> rdd.toDF().collect()
-        [Row(name='Alice', age=1)]
+        >>> rdd = spark.range(1).rdd.map(lambda x: tuple(x))
+        >>> rdd.collect()
+        [(0,)]
+        >>> spark.range(1).show()

Review Comment:
   ```suggestion
           >>> rdd.toDF().show()
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37582:
URL: https://github.com/apache/spark/pull/37582#issuecomment-1221232624

   cc @Yikun @itholic @viirya @xinrong-meng @ueshin @zhengruifeng PTAL when you find some time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #37582:
URL: https://github.com/apache/spark/pull/37582#issuecomment-1221235536

   Two minor comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a diff in pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #37582:
URL: https://github.com/apache/spark/pull/37582#discussion_r950651293


##########
python/pyspark/sql/session.py:
##########
@@ -1185,6 +1363,22 @@ def readStream(self) -> DataStreamReader:
         Returns
         -------
         :class:`DataStreamReader`
+
+        Examples
+        --------
+        >>> spark.readStream
+        <pyspark.sql.streaming.readwriter.DataStreamReader object ...>
+
+        The example below uses Rate source that generates rows continously.
+        After that, we operate a modulo by 3, and then writes the stream out to the console.

Review Comment:
   ```suggestion
           After that, we operate a modulo by 3, and then write the stream out to the console.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37582:
URL: https://github.com/apache/spark/pull/37582#issuecomment-1221438513

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained
URL: https://github.com/apache/spark/pull/37582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a diff in pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #37582:
URL: https://github.com/apache/spark/pull/37582#discussion_r950650869


##########
python/pyspark/sql/session.py:
##########
@@ -99,8 +99,15 @@ def toDF(self, schema=None, sampleRatio=None):
 
         Examples
         --------
-        >>> rdd.toDF().collect()
-        [Row(name='Alice', age=1)]
+        >>> rdd = spark.range(1).rdd.map(lambda x: tuple(x))
+        >>> rdd.collect()

Review Comment:
   Hm? I don't see `toDF()` in this example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org