You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/29 06:21:27 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

HyukjinKwon opened a new pull request, #37702:
URL: https://github.com/apache/spark/pull/37702

   ### What changes were proposed in this pull request?
   
   This PR takes https://github.com/apache/spark/pull/37444 over with covering all examples in `pyspark.sql.dataframe`.
   
   This PR proposes to improve the examples in `pyspark.sql.dataframe` by making each example self-contained with more realistic examples.
   
   Closes #37444
   
   ### Why are the changes needed?
   
   To make the documentation more readable and able to copy and paste directly in PySpark shell.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, Documentation changes only
   
   ### How was this patch tested?
   
   Manually ran each examples.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained
URL: https://github.com/apache/spark/pull/37702


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37702:
URL: https://github.com/apache/spark/pull/37702#issuecomment-1231026629

   Sure, that was a big help @Transurgeon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #37702:
URL: https://github.com/apache/spark/pull/37702#discussion_r956980017


##########
python/pyspark/sql/dataframe.py:
##########
@@ -277,10 +297,16 @@ def registerTempTable(self, name: str) -> None:
         .. deprecated:: 2.0.0
             Use :meth:`DataFrame.createOrReplaceTempView` instead.
 
+        Parameters
+        ----------
+        name : str
+            Name of the table to register.

Review Comment:
   ```suggestion
               Name of the temporary table to register.
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4100,7 +4532,8 @@ def toDF(self, *cols: "ColumnOrName") -> "DataFrame":
         Parameters
         ----------
         cols : str

Review Comment:
   ```suggestion
           cols : str, :class:`Column`, or list
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4100,7 +4532,8 @@ def toDF(self, *cols: "ColumnOrName") -> "DataFrame":
         Parameters
         ----------
         cols : str
-            new column names
+            new column names. The length of the list needs to be the same as the number
+            of columns in the initial :class:`DataFrame`

Review Comment:
   ```suggestion
               new column names (string) or expressions (:class:`Column`).
               The length of the list needs to be the same as the number
               of columns in the initial :class:`DataFrame`
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -3067,6 +3436,10 @@ def unionByName(self, other: "DataFrame", allowMissingColumns: bool = False) ->
         ----------
         other : :class:`DataFrame`
             Another :class:`DataFrame` that needs to be combined.
+        allowMissingColumns : bool, optional, default False
+           Specify whether to allow missing columns.
+
+           .. versionadded:: 3.1.0

Review Comment:
   ```suggestion
           .. versionadded:: 3.1.0
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4153,6 +4594,7 @@ def transform(self, func: Callable[..., "DataFrame"], *args: Any, **kwargs: Any)
         |    1|  1|
         |    2|  2|
         +-----+---+
+

Review Comment:
   ```suggestion
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -344,10 +372,16 @@ def createOrReplaceTempView(self, name: str) -> None:
 
         Examples
         --------
+        Create a local temporary view named 'people'

Review Comment:
   ```suggestion
           Create a local temporary view named 'people'.
   ```



##########
python/pyspark/sql/dataframe.py:
##########
@@ -2412,14 +2657,40 @@ def __getitem__(self, item: Union[int, str, Column, List, Tuple]) -> Union[Colum
 
         Examples
         --------
-        >>> df.select(df['age']).collect()
-        [Row(age=2), Row(age=5)]
-        >>> df[ ["name", "age"]].collect()
-        [Row(name='Alice', age=2), Row(name='Bob', age=5)]
-        >>> df[ df.age > 3 ].collect()
-        [Row(age=5, name='Bob')]
-        >>> df[df[0] > 3].collect()
-        [Row(age=5, name='Bob')]
+        >>> df = spark.createDataFrame([
+        ...     (2, "Alice"), (5, "Bob")], schema=["age", "name"])
+
+        Retrieve a column instance.
+
+        >>> df.select(df['age']).show()
+        +---+
+        |age|
+        +---+
+        |  2|
+        |  5|
+        +---+
+
+        Selecting multiple string columns as index.

Review Comment:
   Is it better to use no-ing verb? I see `ing` only used here.
   ```suggestion
           Select multiple string columns as index.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37702:
URL: https://github.com/apache/spark/pull/37702#discussion_r957198837


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3067,6 +3436,10 @@ def unionByName(self, other: "DataFrame", allowMissingColumns: bool = False) ->
         ----------
         other : :class:`DataFrame`
             Another :class:`DataFrame` that needs to be combined.
+        allowMissingColumns : bool, optional, default False
+           Specify whether to allow missing columns.
+
+           .. versionadded:: 3.1.0

Review Comment:
   This is actually intentional (versionadded is for the new parameter)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Transurgeon commented on pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
Transurgeon commented on PR #37702:
URL: https://github.com/apache/spark/pull/37702#issuecomment-1230341749

   @HyukjinKwon, thanks for taking this over.. I hope I helped you guys a bit atleast


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37702:
URL: https://github.com/apache/spark/pull/37702#issuecomment-1229854461

   cc @zhengruifeng @viirya @xinrong-meng @Yikun @itholic @dcoliversun @khalidmammadov in case you find some time to take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37702: [SPARK-40012][PYTHON][DOCS] Make pyspark.sql.dataframe examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37702:
URL: https://github.com/apache/spark/pull/37702#issuecomment-1231026463

   Thanks guys.
   
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org