You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/27 11:35:46 UTC

[GitHub] [spark] beliefer opened a new pull request, #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

beliefer opened a new pull request, #39246:
URL: https://github.com/apache/spark/pull/39246

   ### What changes were proposed in this pull request?
   Implement `DataFrame.stat.cov` with a proto message
   
   Implement `DataFrame.stat.cov` for scala API
   Implement `DataFrame.stat.cov` for python API
   
   
   ### Why are the changes needed?
   for Connect API coverage
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'. New API
   
   
   ### How was this patch tested?
   New test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`
URL: https://github.com/apache/spark/pull/39246


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #39246:
URL: https://github.com/apache/spark/pull/39246#discussion_r1058097335


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -328,6 +329,16 @@ class SparkConnectPlanner(session: SparkSession) {
       .logicalPlan
   }
 
+  private def transformStatCov(rel: proto.StatCov): LogicalPlan = {

Review Comment:
   `df.stat.cov` returns a double value, which always trigger a job to compute it.
   
   the users will need to wait for this computation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #39246:
URL: https://github.com/apache/spark/pull/39246#issuecomment-1366297511

   ping @HyukjinKwon @zhengruifeng @grundprinzip @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39246:
URL: https://github.com/apache/spark/pull/39246#issuecomment-1366614688

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #39246:
URL: https://github.com/apache/spark/pull/39246#issuecomment-1367005096

   @HyukjinKwon @zhengruifeng @grundprinzip Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39246:
URL: https://github.com/apache/spark/pull/39246#discussion_r1058068989


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -328,6 +329,16 @@ class SparkConnectPlanner(session: SparkSession) {
       .logicalPlan
   }
 
+  private def transformStatCov(rel: proto.StatCov): LogicalPlan = {

Review Comment:
   One of the reasons we haven't implemented that until now is that this performs the execution of the query directly as part of the logical plan.
   
   If you compare this to the other methods they all invoke the DF transformation directly.
   
   One question would be to identify how to rewrite the `cov()` function to be a DF transformation or use it as such.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #39246:
URL: https://github.com/apache/spark/pull/39246#discussion_r1058178891


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -328,6 +329,16 @@ class SparkConnectPlanner(session: SparkSession) {
       .logicalPlan
   }
 
+  private def transformStatCov(rel: proto.StatCov): LogicalPlan = {

Review Comment:
   Yes. So we can collect the result of execution and wraps it with `LocalRelation`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org