You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "urosstan-db (via GitHub)" <gi...@apache.org> on 2024/01/31 11:57:14 UTC

[PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

urosstan-db opened a new pull request, #44969:
URL: https://github.com/apache/spark/pull/44969

   ### What changes were proposed in this pull request?
   This pull request should add measuring query execution time on external JDBC data source.
   Another change is changing access right for JDBCRDD class, that is needed for adding another metric (SQL text) which will be done in some next PR.
   
   ### Why are the changes needed?
   Query execution time is very important metric to have
   
   ### Does this PR introduce _any_ user-facing change?
   User can see query execution time on SparkPlan graph under node metrics tab
   
   ### How was this patch tested?
   Tested using custom image
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44969:
URL: https://github.com/apache/spark/pull/44969#discussion_r1472938018


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala:
##########
@@ -99,6 +99,10 @@ trait DataSourceScanExec extends LeafExecNode {
   def inputRDDs(): Seq[RDD[InternalRow]]
 }
 
+object DataSourceScanExec {
+  val numOutputRowsKey = "numOutputRows"

Review Comment:
   This seems unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44969:
URL: https://github.com/apache/spark/pull/44969#discussion_r1473747421


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala:
##########
@@ -171,7 +173,14 @@ private[jdbc] class JDBCRDD(
     limit: Int,
     sortOrders: Array[String],
     offset: Int)
-  extends RDD[InternalRow](sc, Nil) {
+  extends RDD[InternalRow](sc, Nil) with DataSourceMetricsMixin {
+
+  /**
+   * Execution time of the query issued to JDBC connection
+   */
+  val queryExecutionTimeMetric: SQLMetric = SQLMetrics.createNanoTimingMetric(
+    sparkContext,
+    name = "Query execution time")

Review Comment:
   ```suggestion
       name = "JDBC query execution time")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "urosstan-db (via GitHub)" <gi...@apache.org>.
urosstan-db commented on PR #44969:
URL: https://github.com/apache/spark/pull/44969#issuecomment-1920176092

   > Can you provide SQL web UI screenshot to demo this new metrics? And let's make sure JDBC v2 works as well (need to make `JDBCScan#supportedCustomMetrics` support this new metrics)
   
   ![image](https://github.com/apache/spark/assets/155642965/a9a2f2b0-70b8-4241-bafd-1b89c2246eac)
   Fetch time is removed, so it will not be present in UI. 
   JDBC v2 will work with current implementation, because JDBCRelation returns JDBCScanBuilder which return JDBCScan which inherits V1Scan, so DataSourceV2Strategy will create RowDataSourceScanExec during planning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933][SQL] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #44969: [SPARK-46933][SQL] Add query execution time metric to connectors which use JDBCRDD
URL: https://github.com/apache/spark/pull/44969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44969:
URL: https://github.com/apache/spark/pull/44969#discussion_r1472938018


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala:
##########
@@ -99,6 +99,10 @@ trait DataSourceScanExec extends LeafExecNode {
   def inputRDDs(): Seq[RDD[InternalRow]]
 }
 
+object DataSourceScanExec {
+  val numOutputRowsKey = "numOutputRows"

Review Comment:
   This seems unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44969:
URL: https://github.com/apache/spark/pull/44969#discussion_r1473747793


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala:
##########
@@ -272,11 +281,25 @@ private[jdbc] class JDBCRDD(
         ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
     stmt.setFetchSize(options.fetchSize)
     stmt.setQueryTimeout(options.queryTimeout)
+
+    val startTime = System.nanoTime
     rs = stmt.executeQuery()
+    val endTime = System.nanoTime
+
+    val executionTime = endTime - startTime
+    logInfo(s"Query execution time = $executionTime ns")

Review Comment:
   We don't need this log I think. It's redundant with web UI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933][SQL] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44969:
URL: https://github.com/apache/spark/pull/44969#issuecomment-1921506080

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44969:
URL: https://github.com/apache/spark/pull/44969#issuecomment-1919266524

   Can you provide SQL web UI screenshot to demo this new metrics? And let's make sure JDBC v2 works as well (need to make `JDBCScan#supportedCustomMetrics` support this new metrics) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org