You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/02/14 17:33:57 UTC

[GitHub] [spark] hvanhovell opened a new pull request, #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

hvanhovell opened a new pull request, #40019:
URL: https://github.com/apache/spark/pull/40019

   ### What changes were proposed in this pull request?
   Add a lot of the existing Dataframe APIs to the Spark Connect Scala Client.
   
   This PR does not contain:
   - Typed APIs
   - Aggregation
   - Streaming (not supported by connect just yet)
   - NA/Stats functions
   - TempView registration.
   
   ### Why are the changes needed?
   We want the Scala Client Dataset to reach parity with the existing Dataset.
   
   ### How was this patch tested?
   Added a lot of golden tests.
   
   Added a number of test cases to the E2E suite for the functionality that requires server interaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40019:
URL: https://github.com/apache/spark/pull/40019#discussion_r1106559287


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##########
@@ -95,6 +97,121 @@ class Column private[sql] (private[sql] val expr: proto.Expression) extends Logg
   def name(alias: String): Column = Column { builder =>
     builder.getAliasBuilder.addName(alias).setExpr(expr)
   }
+
+  /**
+   * Returns a sort expression based on the descending order of the column.
+   * {{{
+   *   // Scala
+   *   df.sort(df("age").desc)
+   *
+   *   // Java
+   *   df.sort(df.col("age").desc());
+   * }}}
+   *
+   * @group expr_ops
+   * @since 1.3.0

Review Comment:
   (should probably fix these version info)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40019:
URL: https://github.com/apache/spark/pull/40019#issuecomment-1430123971

   cc @zhenlineo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client
URL: https://github.com/apache/spark/pull/40019


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40019:
URL: https://github.com/apache/spark/pull/40019#issuecomment-1431494777

   merging this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40019:
URL: https://github.com/apache/spark/pull/40019#issuecomment-1430127246

   For the reviewers. This is a mostly mechanical PR; the size is large but the complexity is low. All implemented documentation and function signatures were copies from Dataset. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40019: [SPARK-42440][CONNECT] Initial set of Dataframe APIs for Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40019:
URL: https://github.com/apache/spark/pull/40019#discussion_r1106559720


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##########
@@ -95,6 +97,121 @@ class Column private[sql] (private[sql] val expr: proto.Expression) extends Logg
   def name(alias: String): Column = Column { builder =>
     builder.getAliasBuilder.addName(alias).setExpr(expr)
   }
+
+  /**
+   * Returns a sort expression based on the descending order of the column.
+   * {{{
+   *   // Scala
+   *   df.sort(df("age").desc)
+   *
+   *   // Java
+   *   df.sort(df.col("age").desc());
+   * }}}
+   *
+   * @group expr_ops
+   * @since 1.3.0

Review Comment:
   yeah will do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org