You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/11 21:15:11 UTC

[GitHub] [spark] amaliujia commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

amaliujia commented on code in PR #38631:
URL: https://github.com/apache/spark/pull/38631#discussion_r1020561328


##########
python/pyspark/sql/connect/column.py:
##########
@@ -82,6 +82,74 @@ def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression":
     def __str__(self) -> str:
         ...
 
+    def alias(self, *alias: str, **kwargs: Any) -> "Expression":
+        """
+        Returns this column aliased with a new name or names (in the case of expressions that
+        return more than one column, such as explode).
+
+        .. versionadded:: 1.3.0

Review Comment:
   version 3.4.0



##########
python/pyspark/sql/connect/column.py:
##########
@@ -82,6 +82,74 @@ def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression":
     def __str__(self) -> str:
         ...
 
+    def alias(self, *alias: str, **kwargs: Any) -> "Expression":
+        """
+        Returns this column aliased with a new name or names (in the case of expressions that
+        return more than one column, such as explode).
+
+        .. versionadded:: 1.3.0
+
+        Parameters
+        ----------
+        alias : str
+            desired column names (collects all positional arguments passed)
+
+        Other Parameters
+        ----------------
+        metadata: dict
+            a dict of information to be stored in ``metadata`` attribute of the
+            corresponding :class:`StructField <pyspark.sql.types.StructField>` (optional, keyword
+            only argument)
+
+            .. versionchanged:: 2.2.0
+               Added optional ``metadata`` argument.

Review Comment:
   we don't need this: 
   
   Connect is new API and everything will start from 3.4.0



##########
python/pyspark/sql/connect/column.py:
##########
@@ -82,6 +82,74 @@ def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression":
     def __str__(self) -> str:
         ...
 
+    def alias(self, *alias: str, **kwargs: Any) -> "Expression":
+        """
+        Returns this column aliased with a new name or names (in the case of expressions that
+        return more than one column, such as explode).
+
+        .. versionadded:: 1.3.0
+
+        Parameters
+        ----------
+        alias : str
+            desired column names (collects all positional arguments passed)
+
+        Other Parameters
+        ----------------
+        metadata: dict
+            a dict of information to be stored in ``metadata`` attribute of the
+            corresponding :class:`StructField <pyspark.sql.types.StructField>` (optional, keyword
+            only argument)
+
+            .. versionchanged:: 2.2.0
+               Added optional ``metadata`` argument.
+
+        Returns
+        -------
+        :class:`Column`
+            Column representing whether each element of Column is aliased with new name or names.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame(
+        ...      [(2, "Alice"), (5, "Bob")], ["age", "name"])
+        >>> df.select(df.age.alias("age2")).collect()
+        [Row(age2=2), Row(age2=5)]
+        >>> df.select(df.age.alias("age3", metadata={'max': 99})).schema['age3'].metadata['max']
+        99
+        """
+        metadata = kwargs.pop("metadata", None)
+        assert not kwargs, "Unexpected kwargs where passed: %s" % kwargs
+        return ColumnAlias(self, list(alias), metadata)
+
+
+class ColumnAlias(Expression):
+    def __init__(self, parent: Expression, alias: list[str], metadata: Any):
+
+        self._alias = alias
+        self._metadata = metadata
+        self._parent = parent
+
+    def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression":
+        if len(self._alias) == 1:
+            if self._metadata:
+                raise ValueError("Creating aliases with metadata is not supported.")
+            else:
+                exp = proto.Expression()
+                exp.alias.name.append(self._alias[0])
+                exp.alias.expr.CopyFrom(self._parent.to_plan(session))
+                return exp
+        else:
+            if self._metadata:
+                raise ValueError("metadata can only be provided for a single column")

Review Comment:
   Wait literally metadata is not supported yet..
   
   Maybe just one `raise ValueError`



##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -334,7 +334,11 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformAlias(alias: proto.Expression.Alias): NamedExpression = {
-    Alias(transformExpression(alias.getExpr), alias.getName)()
+    if (alias.getNameCount == 1) {
+      Alias(transformExpression(alias.getExpr), alias.getName(0))()
+    } else {
+      MultiAlias(transformExpression(alias.getExpr), alias.getNameList.asScala.toSeq)

Review Comment:
   Can you make a change in Connect DSL and add a server side test in `SparkConnectProtoSuite`? 
   
   Should be straightforward. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org