You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/27 13:03:37 UTC

[GitHub] [spark] EnricoMi opened a new pull request, #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

EnricoMi opened a new pull request, #38020:
URL: https://github.com/apache/spark/pull/38020

   ### What changes were proposed in this pull request?
   As discussed in https://github.com/apache/spark/pull/37407#discussion_r977818035, method `pyspark.sql.DataFrame.unpivot` should support only column names for `ids` and `values` arguments, to mirror PySpark `GroupedData.pivot`.
   
   ### Why are the changes needed?
   Simplify and align API with existing PySpark SQL methods.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, modifies signature of `pyspark.sql.DataFrame.unpivot`, which has not been released yet.
   
   ### How was this patch tested?
   Tests `unpivot` with non-string arguments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #38020:
URL: https://github.com/apache/spark/pull/38020#discussion_r1032609227


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3175,12 +3177,12 @@ def melt(
 
         Parameters
         ----------
-        ids : str, Column, tuple, list, optional
-            Column(s) to use as identifiers. Can be a single column or column name,
-            or a list or tuple for multiple columns.
-        values : str, Column, tuple, list, optional
-            Column(s) to unpivot. Can be a single column or column name, or a list or tuple
-            for multiple columns. If not specified or empty, uses all columns that
+        ids : str, list, tuple, optional
+            Column name(s) to use as identifiers. Can be a single column name,
+            or a list or tuple for multiple column names.
+        values : str, list, tuple, optional

Review Comment:
   Seems like `values` shouldn't be described as an optional, as it is a required argument.  How about something like this?
   
   ```python
           values : str, Column, tuple, list or None
               Column(s) to unpivot. Can be a single column or column name, or a list or tuple
               for multiple columns. If specified, and not `None` must not be empty. If `None` uses all
               columns that are not set as `ids`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #38020:
URL: https://github.com/apache/spark/pull/38020#discussion_r1032612453


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3175,12 +3177,12 @@ def melt(
 
         Parameters
         ----------
-        ids : str, Column, tuple, list, optional
-            Column(s) to use as identifiers. Can be a single column or column name,
-            or a list or tuple for multiple columns.
-        values : str, Column, tuple, list, optional
-            Column(s) to unpivot. Can be a single column or column name, or a list or tuple
-            for multiple columns. If not specified or empty, uses all columns that
+        ids : str, list, tuple, optional
+            Column name(s) to use as identifiers. Can be a single column name,
+            or a list or tuple for multiple column names.
+        values : str, list, tuple, optional

Review Comment:
   We could also provide defaults  for this and the following arguments:
   
   ```python
   def unpivot(
           self,
           ids: Optional[Union[str, List[str], Tuple[str, ...]]],
           values: Optional[Union[str, List[str], Tuple[str, ...]]] = None,
           variableColumnName: str = "variable", 
           valueColumnName: str = "value",
   ) -> "DataFrame":
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on PR #38020:
URL: https://github.com/apache/spark/pull/38020#issuecomment-1262202569

   Closing as discussed in https://github.com/apache/spark/pull/37407#discussion_r983393051.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #38020:
URL: https://github.com/apache/spark/pull/38020#discussion_r1032609227


##########
python/pyspark/sql/dataframe.py:
##########
@@ -3175,12 +3177,12 @@ def melt(
 
         Parameters
         ----------
-        ids : str, Column, tuple, list, optional
-            Column(s) to use as identifiers. Can be a single column or column name,
-            or a list or tuple for multiple columns.
-        values : str, Column, tuple, list, optional
-            Column(s) to unpivot. Can be a single column or column name, or a list or tuple
-            for multiple columns. If not specified or empty, uses all columns that
+        ids : str, list, tuple, optional
+            Column name(s) to use as identifiers. Can be a single column name,
+            or a list or tuple for multiple column names.
+        values : str, list, tuple, optional

Review Comment:
   Seems like `values` shouldn't be described as an optional, as it is a required argument.  How about something like this?
   
   ```python
           values : str, Column, tuple, list or None
               Column(s) to unpivot. Can be a single column or column name, or a list or tuple
               for multiple columns. If specified, and not `None` must not be empty. If `None` uses all
               columns that are not set as `ids`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi closed pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
EnricoMi closed pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only
URL: https://github.com/apache/spark/pull/38020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38020:
URL: https://github.com/apache/spark/pull/38020#issuecomment-1262734874

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38020: [SPARK-39877][FOLLOW-UP] PySpark DataFrame.unpivot allows for column names only

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38020:
URL: https://github.com/apache/spark/pull/38020#issuecomment-1259520388

   how do you think about https://github.com/apache/spark/pull/37407#discussion_r981218176 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org