You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/08/17 02:59:24 UTC

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42515: [SPARK-44831][PYTHON][DOCS] Refine DocString of `Union*`

zhengruifeng commented on code in PR #42515:
URL: https://github.com/apache/spark/pull/42515#discussion_r1296620355


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3752,46 +3752,96 @@ def union(self, other: "DataFrame") -> "DataFrame":
         Parameters
         ----------
         other : :class:`DataFrame`
-            Another :class:`DataFrame` that needs to be unioned
+            Another :class:`DataFrame` that needs to be unioned.
 
         Returns
         -------
         :class:`DataFrame`
+            A new :class:`DataFrame` containing the combined rows with corresponding columns.
 
         See Also
         --------
         DataFrame.unionAll
 
         Notes
         -----
-        This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
-        (that does deduplication of elements), use this function followed by :func:`distinct`.
+        This method performs a SQL-style set union of the rows from both `DataFrame` objects,
+        with no automatic deduplication of elements.
 
-        Also as standard in SQL, this function resolves columns by position (not by name).
+        Use the `distinct()` method to perform deduplication of rows.
+
+        The method resolves columns by position (not by name), following the standard behavior
+        in SQL.
+
+        Alias: The `union` method was previously named `unionAll` in versions before 2.0.0.

Review Comment:
   emm, seems it's not needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org