You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/30 05:59:03 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

zhengruifeng opened a new pull request, #38847:
URL: https://github.com/apache/spark/pull/38847

   ### What changes were proposed in this pull request?
   fix the wrong `nullOrdering`
   
   
   ### Why are the changes needed?
   existing implementation generate incorrect `nullOrdering`;
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   added ut
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38847:
URL: https://github.com/apache/spark/pull/38847#issuecomment-1331935680

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035592121


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -411,6 +411,48 @@ def test_subquery_alias(self) -> None:
         )
         self.assertTrue("special_alias" in plan_text)
 
+    def test_sort(self):
+        # SPARK-41332: test sort
+        query = """
+            SELECT * FROM VALUES
+            (false, 1, NULL), (false, NULL, 2.0), (NULL, 3, 3.0)
+            AS tab(a, b, c)
+            """
+        # +-----+----+----+
+        # |    a|   b|   c|
+        # +-----+----+----+
+        # |false|   1|null|
+        # |false|null| 2.0|
+        # | null|   3| 3.0|
+        # +-----+----+----+
+
+        cdf = self.connect.sql(query)
+        sdf = self.spark.sql(query)
+        self.assert_eq(
+            cdf.sort("a").toPandas(),
+            sdf.sort("a").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort("c").toPandas(),
+            sdf.sort("c").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort("b").toPandas(),
+            sdf.sort("b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c, "b").toPandas(),
+            sdf.sort(sdf.c, "b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c.desc(), "b").toPandas(),
+            sdf.sort(sdf.c.desc(), "b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c.desc(), cdf.a.asc()).toPandas(),
+            sdf.sort(sdf.c.desc(), sdf.a.asc()).toPandas(),

Review Comment:
   no, never test it before.
   
   let me also add some tests in plan-only test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035559023


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -492,40 +492,27 @@ class Sort(LogicalPlan):
     def __init__(
         self,
         child: Optional["LogicalPlan"],
-        columns: List[Union[Column, str]],
+        columns: List["ColumnOrName"],
         is_global: bool,
     ) -> None:
         super().__init__(child)
         self.columns = columns
         self.is_global = is_global
 
     def col_to_sort_field(
-        self, col: Union[Column, str], session: "SparkConnectClient"
+        self, col: "ColumnOrName", session: "SparkConnectClient"
     ) -> proto.Sort.SortField:
+        sort: Optional[SortOrder] = None
         if isinstance(col, Column):
-            sf = proto.Sort.SortField()
-            sf.expression.CopyFrom(col.to_plan(session))
-            sf.direction = (
-                proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
-                if col._expr.ascending
-                else proto.Sort.SortDirection.SORT_DIRECTION_DESCENDING
-            )
-            sf.nulls = (
-                proto.Sort.SortNulls.SORT_NULLS_FIRST
-                if not col._expr.nullsLast
-                else proto.Sort.SortNulls.SORT_NULLS_LAST
-            )
-            return sf
-        else:
-            sf = proto.Sort.SortField()
-            # Check string
-            if isinstance(col, Column):
-                sf.expression.CopyFrom(col.to_plan(session))
+            if isinstance(col._expr, SortOrder):
+                sort = col._expr
             else:
-                sf.expression.CopyFrom(self.unresolved_attr(col))
-            sf.direction = proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
-            sf.nulls = proto.Sort.SortNulls.SORT_NULLS_LAST

Review Comment:
   wrong default null ordering



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035558901


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -492,40 +492,27 @@ class Sort(LogicalPlan):
     def __init__(
         self,
         child: Optional["LogicalPlan"],
-        columns: List[Union[Column, str]],
+        columns: List["ColumnOrName"],
         is_global: bool,
     ) -> None:
         super().__init__(child)
         self.columns = columns
         self.is_global = is_global
 
     def col_to_sort_field(
-        self, col: Union[Column, str], session: "SparkConnectClient"
+        self, col: "ColumnOrName", session: "SparkConnectClient"
     ) -> proto.Sort.SortField:
+        sort: Optional[SortOrder] = None
         if isinstance(col, Column):
-            sf = proto.Sort.SortField()
-            sf.expression.CopyFrom(col.to_plan(session))
-            sf.direction = (
-                proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
-                if col._expr.ascending
-                else proto.Sort.SortDirection.SORT_DIRECTION_DESCENDING
-            )
-            sf.nulls = (
-                proto.Sort.SortNulls.SORT_NULLS_FIRST
-                if not col._expr.nullsLast
-                else proto.Sort.SortNulls.SORT_NULLS_LAST
-            )
-            return sf
-        else:
-            sf = proto.Sort.SortField()
-            # Check string
-            if isinstance(col, Column):
-                sf.expression.CopyFrom(col.to_plan(session))

Review Comment:
   this path is not reachable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035558186


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -411,6 +411,48 @@ def test_subquery_alias(self) -> None:
         )
         self.assertTrue("special_alias" in plan_text)
 
+    def test_sort(self):

Review Comment:
   these tests fails before this PR, we should add more e2e tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035591384


##########
python/pyspark/sql/connect/column.py:
##########
@@ -307,7 +305,21 @@ def __str__(self) -> str:
         return str(self.ref) + " ASC" if self.ascending else " DESC"
 
     def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
-        return self.ref.to_plan(session)
+        # TODO: move SortField from relations.proto to expression.proto

Review Comment:
   nice, will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38847:
URL: https://github.com/apache/spark/pull/38847#discussion_r1035589632


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -411,6 +411,48 @@ def test_subquery_alias(self) -> None:
         )
         self.assertTrue("special_alias" in plan_text)
 
+    def test_sort(self):

Review Comment:
   big +1 



##########
python/pyspark/sql/connect/column.py:
##########
@@ -307,7 +305,21 @@ def __str__(self) -> str:
         return str(self.ref) + " ASC" if self.ascending else " DESC"
 
     def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
-        return self.ref.to_plan(session)
+        # TODO: move SortField from relations.proto to expression.proto

Review Comment:
   Nit: better to have a JIRA link here. I usually do TODO(SPARK-xxxx)



##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -411,6 +411,48 @@ def test_subquery_alias(self) -> None:
         )
         self.assertTrue("special_alias" in plan_text)
 
+    def test_sort(self):
+        # SPARK-41332: test sort
+        query = """
+            SELECT * FROM VALUES
+            (false, 1, NULL), (false, NULL, 2.0), (NULL, 3, 3.0)
+            AS tab(a, b, c)
+            """
+        # +-----+----+----+
+        # |    a|   b|   c|
+        # +-----+----+----+
+        # |false|   1|null|
+        # |false|null| 2.0|
+        # | null|   3| 3.0|
+        # +-----+----+----+
+
+        cdf = self.connect.sql(query)
+        sdf = self.spark.sql(query)
+        self.assert_eq(
+            cdf.sort("a").toPandas(),
+            sdf.sort("a").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort("c").toPandas(),
+            sdf.sort("c").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort("b").toPandas(),
+            sdf.sort("b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c, "b").toPandas(),
+            sdf.sort(sdf.c, "b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c.desc(), "b").toPandas(),
+            sdf.sort(sdf.c.desc(), "b").toPandas(),
+        )
+        self.assert_eq(
+            cdf.sort(cdf.c.desc(), cdf.a.asc()).toPandas(),
+            sdf.sort(sdf.c.desc(), sdf.a.asc()).toPandas(),

Review Comment:
   we don't have null last and null first test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38847: [SPARK-41332][CONNECT][PYTHON] Fix `nullOrdering` in `SortOrder`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38847: [SPARK-41332][CONNECT][PYTHON]  Fix `nullOrdering` in `SortOrder`
URL: https://github.com/apache/spark/pull/38847


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org