You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/05 10:12:25 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

zhengruifeng opened a new pull request, #38915:
URL: https://github.com/apache/spark/pull/38915

   ### What changes were proposed in this pull request?
   Implement `product` function
   
   
   ### Why are the changes needed?
   for API coverage
   
   
   ### Does this PR introduce _any_ user-facing change?
   new API
   
   
   ### How was this patch tested?
   added test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38915:
URL: https://github.com/apache/spark/pull/38915#issuecomment-1340235289

   @zhengruifeng mind fixing the conflicts? Otherwise should be good to go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040331473


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   good point, I think we will have more similar cases, let me add a new function for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040835872


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   We don't have a mechanism to disallow functions in `FunctionRegistry`. We can build such a mechanism but it's simpler to define the list of "hidden functions" here.
   
   There is no specific reason why `product` can't be a SQL function. We just need a separate PR to propose it and justify it (is it in the SQL standard? is it commonly supported in other databases?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040759564


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -552,6 +554,27 @@ class SparkConnectPlanner(session: SparkSession) {
     }
   }
 
+  /**
+   * For some reason, not all functions are registered in 'FunctionRegistry'. For a unregistered
+   * function, we can still wrap it under the proto 'UnresolvedFunction', and then resolve it in
+   * this method.
+   */
+  private def transformUnregisteredFunction(
+      fun: proto.Expression.UnresolvedFunction): Option[Expression] = {
+    fun.getFunctionName match {
+      case "product" =>
+        if (fun.getArgumentsCount != 1) {
+          throw InvalidPlanInput("Product requires single child expression")

Review Comment:
   I'm not sure about the exception type here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1039881972


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   My concern on such way is it looks too specialized. Not sure how many such case we will have, but at least can we pull this `if and if` to a validation function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040835872


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   We don't have a mechanism to disallow functions in `FunctionRegistry`. We can build such a mechanism but it's simpler to define the list of "hidden functions" here.
   
   There is no specific reason why `product` can't be a SQL function. We just need a separate PR to propose it and justify it (is it in SQL standard? is it common supported in other databases?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040330791


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   cc @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-meng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
xinrong-meng commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1039865556


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   I was thinking we wanted to add `product` to `FunctionRegistry` as a follow-up for https://github.com/apache/spark/pull/38905. I didn't see how we've done this in this PR. Would you help me understand?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1039400753


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   Had a discussion with @cloud-fan , we don't want to expose all expressions in SQL syntax, and `product` is such an example.
   
   I think we will also deal with dedicated expressions for Pandas-on-Spark in this way. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #38915:
URL: https://github.com/apache/spark/pull/38915#issuecomment-1340330948

   merged into master, thank you for reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1039998516


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSession) {
    * @return
    */
   private def transformScalarFunction(fun: proto.Expression.UnresolvedFunction): Expression = {
+    if (fun.getFunctionName == "product") {

Review Comment:
   What is the reason for not allowing this in SQL? And defining it in the function registry should not necessarily mean it can be used from SQL. It might be easier to disallow usage in SQL in the function registry then dealing with it here. I think the function registry is the right place for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38915:
URL: https://github.com/apache/spark/pull/38915#discussion_r1040583837


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -552,6 +554,27 @@ class SparkConnectPlanner(session: SparkSession) {
     }
   }
 
+  /**
+   * For some reason, not all functions are registered in 'FunctionRegistry'. For a unregistered
+   * function, we can still wrap it under the proto 'UnresolvedFunction', and then resolve it in
+   * this method.
+   */
+  private def transformUnregisteredFunction(
+      fun: proto.Expression.UnresolvedFunction): Option[Expression] = {
+    fun.getFunctionName match {
+      case "product" =>
+        if (fun.getArgumentsCount != 1) {
+          throw InvalidPlanInput("Product requires single child expression")

Review Comment:
   Should this throw an Analysis exception instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function
URL: https://github.com/apache/spark/pull/38915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org