You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/07 09:21:49 UTC

[GitHub] [spark] itholic commented on a diff in pull request #37761: [SPARK-40311][SQL][PYTHON] Add withColumnsRenamed to scala and pyspark API

itholic commented on code in PR #37761:
URL: https://github.com/apache/spark/pull/37761#discussion_r964590026


##########
python/pyspark/sql/dataframe.py:
##########
@@ -4430,6 +4430,48 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame":
         """
         return DataFrame(self._jdf.withColumnRenamed(existing, new), self.sparkSession)
 
+    def withColumnsRenamed(self, *colsMap: Dict[str, str]) -> "DataFrame":
+        """
+        Returns a new :class:`DataFrame` by renaming multiple columns.
+        This is a no-op if schema doesn't contain the given column names.
+
+        The colsMap is a map of existing column name and corresponding desired column names.

Review Comment:
   Maybe we don't need to describe the parameter detail in the summary, since it's duplicated information with `Parameters` section ??



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4430,6 +4430,48 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame":
         """
         return DataFrame(self._jdf.withColumnRenamed(existing, new), self.sparkSession)
 
+    def withColumnsRenamed(self, *colsMap: Dict[str, str]) -> "DataFrame":

Review Comment:
   We also need to add a new API to `python/docs/source/reference/pyspark.sql/dataframe.rst` for documentation.



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4430,6 +4430,48 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame":
         """
         return DataFrame(self._jdf.withColumnRenamed(existing, new), self.sparkSession)
 
+    def withColumnsRenamed(self, *colsMap: Dict[str, str]) -> "DataFrame":
+        """
+        Returns a new :class:`DataFrame` by renaming multiple columns.
+        This is a no-op if schema doesn't contain the given column names.
+
+        The colsMap is a map of existing column name and corresponding desired column names.
+
+        .. versionadded:: 3.4.0
+           Added support for multiple columns renaming
+
+        Parameters
+        ----------
+        colsMap : dict
+            a dict of existing column names and corresponding desired column names.
+            Currently, only single map is supported.
+
+        Returns
+        -------
+        :class:`DataFrame`
+            DataFrame with renamed columns.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
+        >>> df = df.withColumns({'age2': df.age + 2, 'age3': df.age + 3})
+        >>> df.withColumnsRenamed({'age2': 'age4', 'age3': 'age5'}).show()
+        +---+-----+----+----+
+        |age| name|age4|age5|
+        +---+-----+----+----+
+        |  2|Alice|   4|   5|
+        |  5|  Bob|   7|   8|
+        +---+-----+----+----+
+        """
+        # Below code is to help enable kwargs in future.
+        assert len(colsMap) == 1
+        colsMap = colsMap[0]  # type: ignore[assignment]
+
+        if not isinstance(colsMap, dict):
+            raise TypeError("colsMap must be dict of existing column name and new column name.")

Review Comment:
   Can we also add this negative test case into `python/pyspark/sql/tests/test_dataframe.py` ?



##########
python/pyspark/sql/dataframe.py:
##########
@@ -4430,6 +4430,48 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame":
         """
         return DataFrame(self._jdf.withColumnRenamed(existing, new), self.sparkSession)
 
+    def withColumnsRenamed(self, *colsMap: Dict[str, str]) -> "DataFrame":
+        """
+        Returns a new :class:`DataFrame` by renaming multiple columns.
+        This is a no-op if schema doesn't contain the given column names.
+
+        The colsMap is a map of existing column name and corresponding desired column names.
+
+        .. versionadded:: 3.4.0
+           Added support for multiple columns renaming
+
+        Parameters
+        ----------
+        colsMap : dict
+            a dict of existing column names and corresponding desired column names.
+            Currently, only single map is supported.
+
+        Returns
+        -------
+        :class:`DataFrame`
+            DataFrame with renamed columns.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
+        >>> df = df.withColumns({'age2': df.age + 2, 'age3': df.age + 3})
+        >>> df.withColumnsRenamed({'age2': 'age4', 'age3': 'age5'}).show()
+        +---+-----+----+----+
+        |age| name|age4|age5|
+        +---+-----+----+----+
+        |  2|Alice|   4|   5|
+        |  5|  Bob|   7|   8|
+        +---+-----+----+----+

Review Comment:
   What about adding `See Also` to link the related functions ??
   
   e.g.
   
   ```python
   See Also
   --------
   :meth:`withColumns`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org