You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/01 04:59:51 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

zhengruifeng opened a new pull request, #38858:
URL: https://github.com/apache/spark/pull/38858

   ### What changes were proposed in this pull request?
   1, remove `asc` and `desc` from `Expression`;
   2, add `asc_nulls_first`, `asc_nulls_last`, `desc_nulls_first`, `desc_nulls_last` `in Column`;
   3, add these methods in `functions`;
   4, add a new test file for functions
   
   ### Why are the changes needed?
   for api coverage
   
   ### Does this PR introduce _any_ user-facing change?
   yes, new APIs
   
   
   ### How was this patch tested?
   added ut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions
URL: https://github.com/apache/spark/pull/38858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036702656


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)
+
+
 def col(x: str) -> Column:
     return Column(ColumnReference(x))
 
 
+column = col
+
+
 def lit(x: Any) -> Column:
     return Column(LiteralExpression(x))
+
+
+def asc(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given column name.
+
+    .. versionadded:: 3.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to sort by in the ascending order.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        the column specifying the order.
+
+    Examples
+    --------
+    Sort by the column 'id' in the descending order.
+
+    >>> df = spark.range(5)
+    >>> df = df.sort(desc("id"))
+    >>> df.show()
+    +---+
+    | id|
+    +---+
+    |  4|
+    |  3|
+    |  2|
+    |  1|
+    |  0|
+    +---+
+
+    Sort by the column 'id' in the ascending order.
+
+    >>> df.orderBy(asc("id")).show()
+    +---+
+    | id|
+    +---+
+    |  0|
+    |  1|
+    |  2|
+    |  3|
+    |  4|
+    +---+
+    """
+    return _to_col(col).asc()
+
+
+def asc_nulls_first(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given
+    column name, and null values return before non-null values.
+
+    .. versionadded:: 2.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to sort by in the ascending order.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        the column specifying the order.
+
+    Examples
+    --------
+    >>> df1 = spark.createDataFrame([(1, "Bob"),
+    ...                              (0, None),
+    ...                              (2, "Alice")], ["age", "name"])
+    >>> df1.sort(asc_nulls_first(df1.name)).show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  0| null|
+    |  2|Alice|
+    |  1|  Bob|
+    +---+-----+
+
+    """
+    return _to_col(col).asc_nulls_first()
+
+
+def asc_nulls_last(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given
+    column name, and null values appear after non-null values.
+
+    .. versionadded:: 2.4.0

Review Comment:
   nice. will update.  I not sure whether we can set a top level `versionadded`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036848605


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)

Review Comment:
   after the refactoring, `Column` no longer takes a str as input



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036698430


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)
+
+
 def col(x: str) -> Column:
     return Column(ColumnReference(x))
 
 
+column = col
+
+
 def lit(x: Any) -> Column:
     return Column(LiteralExpression(x))
+
+
+def asc(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given column name.
+
+    .. versionadded:: 3.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to sort by in the ascending order.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        the column specifying the order.
+
+    Examples
+    --------
+    Sort by the column 'id' in the descending order.
+
+    >>> df = spark.range(5)
+    >>> df = df.sort(desc("id"))
+    >>> df.show()
+    +---+
+    | id|
+    +---+
+    |  4|
+    |  3|
+    |  2|
+    |  1|
+    |  0|
+    +---+
+
+    Sort by the column 'id' in the ascending order.
+
+    >>> df.orderBy(asc("id")).show()
+    +---+
+    | id|
+    +---+
+    |  0|
+    |  1|
+    |  2|
+    |  3|
+    |  4|
+    +---+
+    """
+    return _to_col(col).asc()
+
+
+def asc_nulls_first(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given
+    column name, and null values return before non-null values.
+
+    .. versionadded:: 2.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to sort by in the ascending order.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        the column specifying the order.
+
+    Examples
+    --------
+    >>> df1 = spark.createDataFrame([(1, "Bob"),
+    ...                              (0, None),
+    ...                              (2, "Alice")], ["age", "name"])
+    >>> df1.sort(asc_nulls_first(df1.name)).show()
+    +---+-----+
+    |age| name|
+    +---+-----+
+    |  0| null|
+    |  2|Alice|
+    |  1|  Bob|
+    +---+-----+
+
+    """
+    return _to_col(col).asc_nulls_first()
+
+
+def asc_nulls_last(col: "ColumnOrName") -> Column:
+    """
+    Returns a sort expression based on the ascending order of the given
+    column name, and null values appear after non-null values.
+
+    .. versionadded:: 2.4.0

Review Comment:
   nit: wrong `versionadded`. BTW do we need always add `versionadded  3.4.0` given the entire package is `versionadded`? Maybe only need a top level `versionadded`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036701948


##########
python/pyspark/sql/connect/column.py:
##########
@@ -84,7 +85,7 @@ def __init__(self) -> None:
     def to_plan(self, session: "SparkConnectClient") -> "proto.Expression":
         ...
 
-    def __str__(self) -> str:
+    def __repr__(self) -> str:

Review Comment:
   a simple example for why we should use `__repr__`:
   
   ```
   
   In [20]: class A:
       ...:     def __str__(self) -> str:
       ...:         return "123"
       ...:
   
   In [21]: a = A()
   
   In [22]: a
   Out[22]: <__main__.A at 0x1034bef40>
   
   In [23]: str(a)
   Out[23]: '123'
   
   In [24]: repr(a)
   Out[24]: '<__main__.A object at 0x1034bef40>'
   
   
   In [30]: class A:
       ...:     def __repr__(self) -> str:
       ...:         return "123"
       ...:
   
   In [31]: a = A()
   
   In [32]: a
   Out[32]: 123
   
   In [33]: str(a)
   Out[33]: '123'
   
   In [34]: repr(a)
   Out[34]: '123'
   
   
   In [35]: class A:
       ...:     def __str__(self) -> str:
       ...:         return "123"
       ...:     def __repr__(self) -> str:
       ...:         return "xyz"
       ...:
   
   In [36]: a = A()
   
   In [37]: a
   Out[37]: xyz
   
   In [38]: str(a)
   Out[38]: '123'
   
   In [39]: repr(a)
   Out[39]: 'xyz'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036820798


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)
+
+
 def col(x: str) -> Column:
     return Column(ColumnReference(x))
 
 
+column = col

Review Comment:
   `column` itself is a public function.
   
   https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L208



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036819389


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)
+
+
 def col(x: str) -> Column:
     return Column(ColumnReference(x))
 
 
+column = col

Review Comment:
   why???



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036816655


##########
python/pyspark/sql/connect/functions.py:
##########
@@ -16,14 +16,247 @@
 #
 from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference
 
-from typing import Any
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pyspark.sql.connect._typing import ColumnOrName
 
 # TODO(SPARK-40538) Add support for the missing PySpark functions.
 
 
+def _to_col(col: "ColumnOrName") -> Column:
+    return col if isinstance(col, Column) else column(col)

Review Comment:
   ```suggestion
       return col if isinstance(col, Column) else Column(col)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1040561833


##########
python/pyspark/sql/tests/connect/test_connect_function.py:
##########
@@ -0,0 +1,184 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from typing import Any
+import unittest
+import tempfile
+
+from pyspark.testing.sqlutils import have_pandas, SQLTestUtils
+
+from pyspark.sql import SparkSession, Row
+from pyspark.sql.types import StructType, StructField, StringType
+
+if have_pandas:
+    from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
+from pyspark.sql.dataframe import DataFrame
+from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+from pyspark.testing.pandasutils import PandasOnSparkTestCase

Review Comment:
   Hi, @zhengruifeng . This should be inside `if have_pandas`. I made a PR to fix it.
   - https://github.com/apache/spark/pull/38929
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38858:
URL: https://github.com/apache/spark/pull/38858#discussion_r1036698050


##########
python/pyspark/sql/connect/column.py:
##########
@@ -701,11 +688,23 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
     def alias(self, *alias: str, **kwargs: Any) -> "Column":
         return Column(self._expr.alias(*alias, **kwargs))
 
+    def asc(self) -> "Column":
+        return self.asc_nulls_first()

Review Comment:
   oh I didn't know `asc` is `asc_nulls_first`. This is a good simplification. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #38858:
URL: https://github.com/apache/spark/pull/38858#issuecomment-1333570489

   thanks for reviews, merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org