You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wunderalbert (via GitHub)" <gi...@apache.org> on 2024/02/21 13:08:56 UTC

[PR] Correct docstring for pyspark's dataframe.head [spark]

wunderalbert opened a new pull request, #45197:
URL: https://github.com/apache/spark/pull/45197

   ### What changes were proposed in this pull request?
   
   Change description of `returns` in docstring for pyspark's dataframe's `head` to match actual behaviour.
   
   ### Why are the changes needed?
   
   The docstring claimed that `head(n)` would return a `Row` (rather than a list of rows) iff n == 1, but that's incorrect.
   
   Type hints, example, and implementation show that the difference between row or list of rows lies in whether n is supplied at all -- if it isn't, `head()` returns a `Row`, if it is, even if it is 1, `head(n)` returns a list.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, if you count a changed docstring as e.g. be displayed by hover and similar ways in many IDEs.
   
   ### How was this patch tested?
   
   No tests were added since only docs changed.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "wunderalbert (via GitHub)" <gi...@apache.org>.
wunderalbert commented on code in PR #45197:
URL: https://github.com/apache/spark/pull/45197#discussion_r1499320875


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3526,8 +3526,8 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:
 
         Returns
         -------
-        If n is greater than 1, return a list of :class:`Row`.
-        If n is 1, return a single Row.
+        If n is supplied, return a list of :class:`Row`.

Review Comment:
   Good point. I'm suggesting adding this as an example, and extending the `returns` section to mention the length of the list (see commit below).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #45197:
URL: https://github.com/apache/spark/pull/45197#issuecomment-1960091006

   Merged to master, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #45197:
URL: https://github.com/apache/spark/pull/45197#discussion_r1499777728


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3526,8 +3526,8 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:
 
         Returns
         -------
-        If n is greater than 1, return a list of :class:`Row`.
-        If n is 1, return a single Row.
+        If n is supplied, return a list of :class:`Row`.

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #45197:
URL: https://github.com/apache/spark/pull/45197#discussion_r1499777303


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3526,8 +3526,9 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:
 
         Returns
         -------
-        If n is greater than 1, return a list of :class:`Row`.
-        If n is 1, return a single Row.
+        If n is supplied, return a list of :class:`Row` of length n

Review Comment:
   The description is so much better! Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "wunderalbert (via GitHub)" <gi...@apache.org>.
wunderalbert commented on PR #45197:
URL: https://github.com/apache/spark/pull/45197#issuecomment-1959550608

   Thank you @xinrong-meng ! I've created the JIRA and edited the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng closed pull request #45197: [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head
URL: https://github.com/apache/spark/pull/45197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #45197:
URL: https://github.com/apache/spark/pull/45197#issuecomment-1957719043

   Good catch! Thanks for working on that!
   
   Would you create a JIRA and add the JIRA number to the PR title, along with `[DOCS][PYTHON]` labels?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Correct docstring for pyspark's dataframe.head [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #45197:
URL: https://github.com/apache/spark/pull/45197#discussion_r1498150767


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3526,8 +3526,8 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:
 
         Returns
         -------
-        If n is greater than 1, return a list of :class:`Row`.
-        If n is 1, return a single Row.
+        If n is supplied, return a list of :class:`Row`.

Review Comment:
   ```py
   >>> spark.range(3).head(0)
   []
   ```
   I'm wondering if we should call out the empty list when `n=0`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org