You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/16 10:59:01 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

zhengruifeng opened a new pull request, #39090:
URL: https://github.com/apache/spark/pull/39090

   ### What changes were proposed in this pull request?
   
   1. Move `SortOrder` proto from relations to expressions
   2. rename fields according to the Expression in Scala
   
   ### Why are the changes needed?
   `SortOrder` is a common expression, not just in `Sort`, it will also be used in `Window` and `repartitionByRange`
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   updated UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39090:
URL: https://github.com/apache/spark/pull/39090#discussion_r1051537978


##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -37,6 +37,26 @@ message Expression {
     Alias alias = 6;
     Cast cast = 7;
     UnresolvedRegex unresolved_regex = 8;
+    SortOrder sort_order = 9;
+  }
+
+  message SortOrder {

Review Comment:
   doc please



##########
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -246,29 +246,11 @@ message Sort {
   // (Required) Input relation for a Sort.
   Relation input = 1;
 
-  // (Required) Sort fields.
-  repeated SortField sort_fields = 2;
+  // (Required) The ordering expressions

Review Comment:
   ```suggestion
     // (Required) The ordering expressions.
   ```



##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -37,6 +37,26 @@ message Expression {
     Alias alias = 6;
     Cast cast = 7;
     UnresolvedRegex unresolved_regex = 8;
+    SortOrder sort_order = 9;

Review Comment:
   Do you mind adding a bit of documentation that would explain what are valid uses of `SortOrder` in expressions? This is just to make sure it's understood that you can't use this in a projection etc.
   
   If the only use-case for SortOrder is to be used in `Sort` relation and `Window` then maybe it's better to pull it out from the expressions message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #39090:
URL: https://github.com/apache/spark/pull/39090#discussion_r1051700063


##########
python/pyspark/sql/connect/column.py:
##########
@@ -437,35 +437,38 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
 
 
 class SortOrder(Expression):
-    def __init__(self, child: Expression, ascending: bool = True, nullsLast: bool = False) -> None:
+    def __init__(self, child: Expression, ascending: bool = True, nullsFirst: bool = True) -> None:
         super().__init__()
         self._child = child
         self._ascending = ascending
-        self._nullsLast = nullsLast
+        self._nullsFirst = nullsFirst
 
     def __repr__(self) -> str:
         return (
             str(self._child)
             + (" ASC" if self._ascending else " DESC")
-            + (" NULLS LAST" if self._nullsLast else " NULLS FIRST")
+            + (" NULLS FIRST" if self._nullsFirst else " NULLS LAST")
         )
 
     def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
-        # TODO(SPARK-41334): move SortField from relations.proto to expressions.proto
-        sort = proto.Sort.SortField()
-        sort.expression.CopyFrom(self._child.to_plan(session))
+        sort = proto.Expression()
+        sort.sort_order.child.CopyFrom(self._child.to_plan(session))
 
         if self._ascending:
-            sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
+            sort.sort_order.direction = (
+                proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_ASCENDING
+            )
         else:
-            sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_DESCENDING
+            sort.sort_order.direction = (
+                proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_DESCENDING
+            )
 
-        if self._nullsLast:
-            sort.nulls = proto.Sort.SortNulls.SORT_NULLS_LAST
+        if self._nullsFirst:
+            sort.sort_order.null_ordering = proto.Expression.SortOrder.NullOrdering.SORT_NULLS_FIRST
         else:
-            sort.nulls = proto.Sort.SortNulls.SORT_NULLS_FIRST
+            sort.sort_order.null_ordering = proto.Expression.SortOrder.NullOrdering.SORT_NULLS_LAST
 
-        return cast(proto.Expression, sort)

Review Comment:
   thanks, will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #39090:
URL: https://github.com/apache/spark/pull/39090#discussion_r1051711238


##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -37,6 +37,26 @@ message Expression {
     Alias alias = 6;
     Cast cast = 7;
     UnresolvedRegex unresolved_regex = 8;
+    SortOrder sort_order = 9;

Review Comment:
   I add a note for not support in projection, but I guess the developers should be aware of this.



##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -37,6 +37,26 @@ message Expression {
     Alias alias = 6;
     Cast cast = 7;
     UnresolvedRegex unresolved_regex = 8;
+    SortOrder sort_order = 9;
+  }
+
+  message SortOrder {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #39090:
URL: https://github.com/apache/spark/pull/39090#issuecomment-1357050860

   merged into master, thanks for reivews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions
URL: https://github.com/apache/spark/pull/39090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #39090:
URL: https://github.com/apache/spark/pull/39090#discussion_r1051243980


##########
python/pyspark/sql/connect/column.py:
##########
@@ -437,35 +437,38 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
 
 
 class SortOrder(Expression):
-    def __init__(self, child: Expression, ascending: bool = True, nullsLast: bool = False) -> None:
+    def __init__(self, child: Expression, ascending: bool = True, nullsFirst: bool = True) -> None:
         super().__init__()
         self._child = child
         self._ascending = ascending
-        self._nullsLast = nullsLast
+        self._nullsFirst = nullsFirst
 
     def __repr__(self) -> str:
         return (
             str(self._child)
             + (" ASC" if self._ascending else " DESC")
-            + (" NULLS LAST" if self._nullsLast else " NULLS FIRST")
+            + (" NULLS FIRST" if self._nullsFirst else " NULLS LAST")
         )
 
     def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
-        # TODO(SPARK-41334): move SortField from relations.proto to expressions.proto
-        sort = proto.Sort.SortField()
-        sort.expression.CopyFrom(self._child.to_plan(session))
+        sort = proto.Expression()
+        sort.sort_order.child.CopyFrom(self._child.to_plan(session))
 
         if self._ascending:
-            sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
+            sort.sort_order.direction = (
+                proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_ASCENDING
+            )
         else:
-            sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_DESCENDING
+            sort.sort_order.direction = (
+                proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_DESCENDING
+            )
 
-        if self._nullsLast:
-            sort.nulls = proto.Sort.SortNulls.SORT_NULLS_LAST
+        if self._nullsFirst:
+            sort.sort_order.null_ordering = proto.Expression.SortOrder.NullOrdering.SORT_NULLS_FIRST
         else:
-            sort.nulls = proto.Sort.SortNulls.SORT_NULLS_FIRST
+            sort.sort_order.null_ordering = proto.Expression.SortOrder.NullOrdering.SORT_NULLS_LAST
 
-        return cast(proto.Expression, sort)

Review Comment:
   This seems to cause `unused import` linter failure. Could you fix it?
   ```
   ./python/pyspark/sql/connect/column.py:18:1: F401 'typing.cast' imported but unused
   from typing import (
        F401 'typing.cast' imported but unused
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org