You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/11/07 01:42:22 UTC

[PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

ueshin opened a new pull request, #43690:
URL: https://github.com/apache/spark/pull/43690

   ### What changes were proposed in this pull request?
   
   Returns the observed metrics from commands.
   
   ### Why are the changes needed?
   
   Currently the observed metrics on commands are not available.
   
   For example:
   
   ```py
   >>> df = spark.range(10)
   >>>
   >>> observation = Observation()
   >>> observed_df = df.observe(observation, count(lit(1)).alias("cnt"))
   >>>
   >>> observed_df.show()
   ...
   >>> observation.get
   {}
   ```
   
   it should be:
   
   ```py
   >>> observation.get
   {'cnt': 10}
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the observed metrics on commands will be available.
   
   ### How was this patch tested?
   
   Added the related tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43690:
URL: https://github.com/apache/spark/pull/43690#issuecomment-1811671653

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #43690:
URL: https://github.com/apache/spark/pull/43690#discussion_r1388725778


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala:
##########
@@ -162,6 +162,18 @@ private[connect] class ExecuteThreadRunner(executeHolder: ExecuteHolder) extends
             s"${executeHolder.request.getPlan.getOpTypeCase} not supported.")
       }
 
+      if (executeHolder.observations.nonEmpty) {
+        val observedMetrics = executeHolder.observations.map { case (name, observation) =>

Review Comment:
   The observed metrics sent in `SparkConnectPlanExecution.handlePlan` is now for the case of `input.isStreaming || executeHolderOpt.isEmpty`; otherwise, they will be sent here.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #43690:
URL: https://github.com/apache/spark/pull/43690#issuecomment-1797143420

   cc @HyukjinKwon @zhengruifeng @beliefer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #43690:
URL: https://github.com/apache/spark/pull/43690#issuecomment-1809213620

   The failed test seems not related.
   
   > 	org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43690: [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands
URL: https://github.com/apache/spark/pull/43690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43690:
URL: https://github.com/apache/spark/pull/43690#discussion_r1388817513


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala:
##########
@@ -162,6 +162,18 @@ private[connect] class ExecuteThreadRunner(executeHolder: ExecuteHolder) extends
             s"${executeHolder.request.getPlan.getOpTypeCase} not supported.")
       }
 
+      if (executeHolder.observations.nonEmpty) {
+        val observedMetrics = executeHolder.observations.map { case (name, observation) =>

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #43690:
URL: https://github.com/apache/spark/pull/43690#discussion_r1388726178


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1065,12 +1065,20 @@ class SparkConnectPlanner(
       numPartitionsOpt)
   }
 
-  private def transformCollectMetrics(rel: proto.CollectMetrics, planId: Long): LogicalPlan = {

Review Comment:
   Updated back to use `planId` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43690:
URL: https://github.com/apache/spark/pull/43690#discussion_r1384289821


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala:
##########
@@ -162,6 +162,18 @@ private[connect] class ExecuteThreadRunner(executeHolder: ExecuteHolder) extends
             s"${executeHolder.request.getPlan.getOpTypeCase} not supported.")
       }
 
+      if (executeHolder.observations.nonEmpty) {
+        val observedMetrics = executeHolder.observations.map { case (name, observation) =>

Review Comment:
   I'm confused here.
   It seems we already create and send observed metrics response at `SparkConnectPlanExecution.handlePlan`.



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1065,12 +1065,20 @@ class SparkConnectPlanner(
       numPartitionsOpt)
   }
 
-  private def transformCollectMetrics(rel: proto.CollectMetrics, planId: Long): LogicalPlan = {

Review Comment:
   We don't need to semantically validate `CollectMetrics`? cc @allisonwang-db 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org