You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/22 10:01:04 UTC

[GitHub] [spark] dengziming commented on a diff in pull request #39158: [SPARK-41354][CONNECT] Implement `DataFrame.repartitionByRange`

dengziming commented on code in PR #39158:
URL: https://github.com/apache/spark/pull/39158#discussion_r1055284521


##########
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -57,6 +57,7 @@ message Relation {
     Hint hint = 24;
     Unpivot unpivot = 25;
     ToSchema to_schema = 26;
+    RepartitionByExpression repartition_by_expression = 27;

Review Comment:
   This is to be consistent with the DataSet API, in DataSet, both `repartition(columns)` an `repartitionByRange(*)` will be converted to a `RepartitionByExpression`, we don't have a `RepartitionByRange` logical plan.



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala:
##########
@@ -656,6 +668,56 @@ package object dsl {
             Repartition.newBuilder().setInput(logicalPlan).setNumPartitions(num).setShuffle(true))
           .build()
 
+      def repartition(partitionExprs: Expression*): Relation = {

Review Comment:
   I added both `repartition` and `repartitionByRange` since they will both be transformed to a `RepartitionByExpression`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org