You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/08 08:59:16 UTC

[GitHub] [spark] AngersZhuuuu opened a new pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

AngersZhuuuu opened a new pull request #31522:
URL: https://github.com/apache/spark/pull/31522


   
   ### What changes were proposed in this pull request?
   Since we have add log about commit time, I think this useful and we can make user know it directly in SQL tab's UI.
   
   ![image](https://user-images.githubusercontent.com/46485123/107197480-e8b2bd80-6a2e-11eb-849a-144462b0924d.png)
   
   
   
   ### Why are the changes needed?
   Make user can directly know commit duration.
   
   
   ### Does this PR introduce _any_ user-facing change?
   User can see file commit duration in SQL tab's SQL plan graph
   
   
   ### How was this patch tested?
   Mannul tested
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885029561






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883410686


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881803662


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45706/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881646025


   **[Test build #141178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141178/testReport)** for PR 31522 at commit [`559d766`](https://github.com/apache/spark/commit/559d7665ef58683bca3121294f552622626838fc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775770538


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39648/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784725376


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39978/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784736336


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39978/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881733051


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45690/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880621410


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141058/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881826398


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775736964


   **[Test build #135066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135066/testReport)** for PR 31522 at commit [`89f8201`](https://github.com/apache/spark/commit/89f8201ab6f2069d03aefb8d3184017556858453).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581035708



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -232,7 +234,8 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      DURATION_FILE_COMMIT-> SQLMetrics.createTimingMetric(sparkContext, "duration of commit files")

Review comment:
       duration of committing the job?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564182






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882450442






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884900754


   **[Test build #141495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141495/testReport)** for PR 31522 at commit [`b5c9d63`](https://github.com/apache/spark/commit/b5c9d6338c53c52724c0a21d61a093194745b98f).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671085212



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -103,5 +103,7 @@ trait WriteJobStatsTracker extends Serializable {
    * to the expected derived type when implementing this method in a derived class.
    * The framework will make sure to call this with the right arguments.
    */
-  def processStats(stats: Seq[WriteTaskStats]): Unit
+  def processStats(stats: Seq[WriteTaskStats], jobCommitDuration: Long): Unit
+
+  def updateTaskWriteAndCommitDuration(duration: Long): Unit

Review comment:
       same to the job commit time metrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880360525


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882450442






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880370943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782549928


   Gentle ping @HeartSaVioR @dongjoon-hyun @HyukjinKwon @maropu @cloud-fan Could you help to review this I think it's really help since always `INSERT` statement slow caused by commit file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782557635


   **[Test build #135289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135289/testReport)** for PR 31522 at commit [`9ddd28c`](https://github.com/apache/spark/commit/9ddd28cabb5c5d44366a109b4b1c7c3ce88b45e1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782563505


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881786424


   **[Test build #141194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141194/testReport)** for PR 31522 at commit [`ee2e3cf`](https://github.com/apache/spark/commit/ee2e3cfb5b8eb6cf207f752845d02e7d26b9ff37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673160702



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +187,21 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    @transient val driverSideMetrics: Map[String, SQLMetric],
+    taskCommitTimeMetric: SQLMetric)
   extends WriteJobStatsTracker {
 
+  def this(
+    serializableHadoopConf: SerializableConfiguration,

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880466866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45571/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-785554038


   > Normally, committing a job should be fast. I don't think it is a good idea to put this in the SQL graph. For debug purposes, the log message should be enough.
   > Besides, the name "duration of committing the job" can be confusing to end-users.
   > I have to leave -1 for this one.
   
   All right, for quick debug, all message shown directly may be more help for spark admins.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793469062


   After a second look, I think it's rare that job committing takes a lot of time. If it happens, we can look at the logs to see the commit duration (as well as the hive LOAD TABLE duration). Most of the time this metrics won't be interesting to the users. Thus, I think we don't need to add this metrics.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881866380


   **[Test build #141198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141198/testReport)** for PR 31522 at commit [`5c41947`](https://github.com/apache/spark/commit/5c41947cb6cce62d581d109b9e69dc99454c29eb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581035458



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -223,6 +224,7 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  private val DURATION_FILE_COMMIT = "durationCommit"

Review comment:
       jobCommitDuration?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581921374



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Sorry but why do we need the show the duration of the function call of `commitJob` here.
   As per the doc:
   ```
     /**
      * Commits a job after the writes succeed. Must be called on the driver.
      */
     def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit
   ```
   The commitJob API mostly is for moving the temporary output files to the target final path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883442968


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45843/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883433878


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45846/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883416289


   **[Test build #141337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141337/testReport)** for PR 31522 at commit [`6d91e25`](https://github.com/apache/spark/commit/6d91e2516c1b2ca2ecb1bf7aa101500dc4cc529e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782591827


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135289/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882450442






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r672965451



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -48,7 +48,9 @@ case class BasicWriteTaskStats(
 /**
  * Simple [[WriteTaskStatsTracker]] implementation that produces [[BasicWriteTaskStats]].
  */
-class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
+class BasicWriteTaskStatsTracker(
+    hadoopConf: Configuration,
+    taskCommitTimeMetrics: Option[SQLMetric] = None)

Review comment:
       when this can be None? test?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {

Review comment:
       why Map? `def taskCommitTimeMetric: SQLMetric ...`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {

Review comment:
       `def taskCommitTimeMetric: (String, SQLMetric)`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##########
@@ -91,11 +91,14 @@ abstract class FileFormatDataWriter(
    * driver too and used to e.g. update the metrics in UI.
    */
   override def commit(): WriteTaskResult = {
-    releaseResources()
+    val (taskCommitMessage, taskCommitTime) = Utils.timeTakenMs {
+      releaseResources()

Review comment:
       shall we include `releaseResources` in the commit time?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -302,13 +302,9 @@ object FileFormatWriter extends Logging {
       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
         // Execute the task to write rows out and commit the task.
         val taskAttemptID = taskAttemptContext.getTaskAttemptID
-        val (res, timeCost) = Utils.timeTakenMs {
-          logDebug("$taskAttemptID starts to write and commit.")
-          dataWriter.writeWithIterator(iterator)
-          dataWriter.commit()
-        }
-        logInfo(s"$taskAttemptID finished to write and commit. Elapsed time: $timeCost ms.")
-        res
+        logDebug(s"$taskAttemptID starts to write and commit.")

Review comment:
       This log doesn't exist before https://github.com/apache/spark/commit/f5a63322def87904ebfa95673a584c094e6062cc#diff-03e7ff19e93cb270d82aca907ac0a1f87463ba6eb5dce78a407bb169b840a6cb , shall we remove it?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -66,10 +66,11 @@ trait WriteTaskStatsTracker {
 
   /**
    * Returns the final statistics computed so far.
+   * @param taskCommitTime The task commit duration.

Review comment:
       nit: `Time of committing the task`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -93,6 +94,7 @@ trait WriteJobStatsTracker extends Serializable {
    * Process the given collection of stats computed during this job.
    * E.g. aggregate them, write them to memory / disk, issue warnings, whatever.
    * @param stats One [[WriteTaskStats]] object from each successful write task.
+   * @param jobCommitDuration Duration of job commit.

Review comment:
       ditto

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"

Review comment:
       `taskCommitTime` and `jobCommitTime`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")

Review comment:
       `time of committing the job`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {
+    val sparkContext = SparkContext.getActive.get
+    Map(TASK_COMMIT_DURATION ->
+      SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"))

Review comment:
       `time of committing tasks`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881786424


   **[Test build #141194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141194/testReport)** for PR 31522 at commit [`ee2e3cf`](https://github.com/apache/spark/commit/ee2e3cfb5b8eb6cf207f752845d02e7d26b9ff37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782563499


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793884402


   @AngersZhuuuu the second graph you posted is from stage page. AFAIK the duration is the sum of executor run times. While in job/sql page the duration is completion_time - submit_time. Your example seems unrelated to this PR.
   Again, I don't think committing a job should be too long. The metrics is also confusing. I will keep my -1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880366656


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784238048


   **[Test build #135376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135376/testReport)** for PR 31522 at commit [`e334a4c`](https://github.com/apache/spark/commit/e334a4c8c32c0c592ee463292eaa69b0495bc57c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880502373


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882512604






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784227452


   cc @gengliangwang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883453876


   **[Test build #141334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141334/testReport)** for PR 31522 at commit [`a4890f2`](https://github.com/apache/spark/commit/a4890f2049050af537c6760fa8e94ba237b9e875).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880621410


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141058/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775071600


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39599/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673144657



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +187,21 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    @transient val driverSideMetrics: Map[String, SQLMetric],
+    taskCommitTimeMetric: SQLMetric)
   extends WriteJobStatsTracker {
 
+  def this(
+    serializableHadoopConf: SerializableConfiguration,

Review comment:
       nit: 4 spaces indentation for parameter list




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775898417


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135066/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782560240


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881772710


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141178/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885029561






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882176352


   Any more suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r672177955



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +186,16 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    val metrics: Map[String, SQLMetric])

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883458698






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671084515



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -103,5 +103,7 @@ trait WriteJobStatsTracker extends Serializable {
    * to the expected derived type when implementing this method in a derived class.
    * The framework will make sure to call this with the right arguments.
    */
-  def processStats(stats: Seq[WriteTaskStats]): Unit
+  def processStats(stats: Seq[WriteTaskStats], jobCommitDuration: Long): Unit
+
+  def updateTaskWriteAndCommitDuration(duration: Long): Unit

Review comment:
       It's unnecessary to break custom `WriteJobStatsTracker` implementations. Can we propagate the task commit time metrics with a different way? like via `WriteJobDescription`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883428783


   **[Test build #141340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141340/testReport)** for PR 31522 at commit [`1192f6f`](https://github.com/apache/spark/commit/1192f6fd3721b7ecfa11b24ea8c9fe065951da28).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885565418


   oh it has conflicts. @AngersZhuuuu can you open a backport PR? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881837794


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671530959



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -230,7 +239,11 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      DURATION_OF_TASK_COMMIT ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"),
+      DURATION_JOB_COMMIT->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of committing the job")

Review comment:
       Maybe, `committing the job` -> `job commit`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671530371



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,6 +228,8 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  private val DURATION_OF_TASK_COMMIT = "taskCommitDuration"
+  private val DURATION_JOB_COMMIT = "jobCommitDuration"

Review comment:
       Shall we rename these `vals` consistently?
   - `DURATION_OF_TASK_COMMIT` -> `TASK_COMMIT_DURATION`
   - `DURATION_JOB_COMMIT` -> `JOB_COMMIT_DURATION`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880343733


   **[Test build #141045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141045/testReport)** for PR 31522 at commit [`7c87991`](https://github.com/apache/spark/commit/7c87991374e6c3538764c265acd3beba98d4b81e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884304277


   **[Test build #141425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141425/testReport)** for PR 31522 at commit [`9b5aa94`](https://github.com/apache/spark/commit/9b5aa94cdb9c1bfee855edcf2cceedfa73142c0f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882676999


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45775/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r580995553



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -192,6 +192,10 @@ class BasicWriteJobStatsTracker(
     new BasicWriteTaskStatsTracker(serializableHadoopConf.value)
   }
 
+  override def processCommitDuration(duration: Long): Unit = {
+    metrics(BasicWriteJobStatsTracker.DURATION_FILE_COMMIT).set(duration)

Review comment:
       Does it really work? The broadcast exchange does not only set the metrics, but also call `postDriverMetricUpdates`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883545670






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582067232



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Normally, this should be fast. I don't think it is a good idea to put this in the SQL graph. For debug purpose, the log message should be enough.
   Besides, the name "duration of committing the job" can be confusing to end-users.
   I have to leave -1 for this one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581035219



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -192,7 +192,7 @@ class BasicWriteJobStatsTracker(
     new BasicWriteTaskStatsTracker(serializableHadoopConf.value)
   }
 
-  override def processStats(stats: Seq[WriteTaskStats]): Unit = {
+  override def processStats(stats: Seq[WriteTaskStats], duration: Long): Unit = {

Review comment:
       `duration` -> `jobCommitDuration` to make it clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883757348


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884556265


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141425/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882777962


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784314764


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39955/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784421426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135376/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881798553


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45706/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671580787



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/CustomWriteTaskStatsTrackerSuite.scala
##########
@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.InternalRow
 class CustomWriteTaskStatsTrackerSuite extends SparkFunSuite {
 
   def checkFinalStats(tracker: CustomWriteTaskStatsTracker, result: Map[String, Int]): Unit = {
-    assert(tracker.getFinalStats().asInstanceOf[CustomWriteTaskStats].numRowsPerFile == result)
+    assert(tracker.getFinalStats(0).asInstanceOf[CustomWriteTaskStats].numRowsPerFile == result)

Review comment:
       Done

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala
##########
@@ -73,7 +73,7 @@ class BasicWriteTaskStatsTrackerSuite extends SparkFunSuite {
   }
 
   private def finalStatus(tracker: BasicWriteTaskStatsTracker): BasicWriteTaskStats = {
-    tracker.getFinalStats().asInstanceOf[BasicWriteTaskStats]
+    tracker.getFinalStats(0).asInstanceOf[BasicWriteTaskStats]

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -103,5 +103,5 @@ trait WriteJobStatsTracker extends Serializable {
    * to the expected derived type when implementing this method in a derived class.
    * The framework will make sure to call this with the right arguments.
    */
-  def processStats(stats: Seq[WriteTaskStats]): Unit
+  def processStats(stats: Seq[WriteTaskStats], jobCommitDuration: Long): Unit

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -69,7 +69,7 @@ trait WriteTaskStatsTracker {
    * @note This may only be called once. Further use of the object may lead to undefined behavior.
    * @return An object of subtype of [[WriteTaskStats]], to be sent to the driver.
    */
-  def getFinalStats(): WriteTaskStats
+  def getFinalStats(taskCommitTime: Long): WriteTaskStats

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -230,7 +239,11 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      DURATION_OF_TASK_COMMIT ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"),
+      DURATION_JOB_COMMIT->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of committing the job")

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-790720569


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40342/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581037797



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -192,6 +192,10 @@ class BasicWriteJobStatsTracker(
     new BasicWriteTaskStatsTracker(serializableHadoopConf.value)
   }
 
+  override def processCommitDuration(duration: Long): Unit = {
+    metrics(BasicWriteJobStatsTracker.DURATION_FILE_COMMIT).set(duration)
+  }
+
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {

Review comment:
       > Instead of adding a new method, we can probably just add one more parameter here. Then we don't need to worry about the execution order of `processCommitDuration` and `processStats`.
   
   Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r674770724



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -230,7 +244,10 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      TASK_COMMIT_TIME ->
+        SQLMetrics.createTimingMetric(sparkContext, "time of committing the tasks"),

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -230,7 +244,10 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      TASK_COMMIT_TIME ->
+        SQLMetrics.createTimingMetric(sparkContext, "time of committing the tasks"),
+      JOB_COMMIT_TIME -> SQLMetrics.createTimingMetric(sparkContext, "time of committing the job")

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582078477



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       `externalCatalog.loadDynamicPartition()` is this really counted as the commit duration in this PR?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r580996896



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -192,6 +192,10 @@ class BasicWriteJobStatsTracker(
     new BasicWriteTaskStatsTracker(serializableHadoopConf.value)
   }
 
+  override def processCommitDuration(duration: Long): Unit = {
+    metrics(BasicWriteJobStatsTracker.DURATION_FILE_COMMIT).set(duration)
+  }
+
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {

Review comment:
       Instead of adding a new method, we can probably just add one more parameter here. Then we don't need to worry about the execution order of `processCommitDuration` and `processStats`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775176318


   **[Test build #135016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135016/testReport)** for PR 31522 at commit [`d065bcd`](https://github.com/apache/spark/commit/d065bcda8cdb555cf409946dade11b35371ae6be).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775791223


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39648/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582610913



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Just now, my friend ask me why job finished then cost 80s to job committed.
   ```
   21/02/25 15:42:12 INFO DAGScheduler: ResultStage 1 (run at AccessController.java:0) finished in 82.189 s
   21/02/25 15:42:12 INFO DAGScheduler: Job 1 finished: run at AccessController.java:0, took 84.330846 s
   21/02/25 15:43:38 INFO FileFormatWriter: Job null committed.
   21/02/25 15:43:38 WARN DFSClient: Slow ReadProcessor read fields took 41202ms (threshold=30000ms); ack: seqno: 140 status: SUCCESS downstreamAckTimeNanos: 33201980 4: "\000", targets: [172.16.1.71:9866, 172.16.1.104:9866, 172.16.1.18:9866, 172.16.1.33:9866]
   ```
   
   His SQL task run 80s, job commit cost 80s and hive metadata load data cost 100s.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu closed pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu closed pull request #31522:
URL: https://github.com/apache/spark/pull/31522


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883354294


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884342887


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784746641


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39978/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673139389



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +186,15 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    @transient val driverSideMetrics: Map[String, SQLMetric],
+    taskCommitTimeMetric: SQLMetric)
   extends WriteJobStatsTracker {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883416289


   **[Test build #141337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141337/testReport)** for PR 31522 at commit [`6d91e25`](https://github.com/apache/spark/commit/6d91e2516c1b2ca2ecb1bf7aa101500dc4cc529e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu edited a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu edited a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782549928


   Gentle ping @HeartSaVioR @dongjoon-hyun @HyukjinKwon @maropu @cloud-fan Could you help to review this I think it's really help since always `INSERT` statement slow caused by commit file. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880343733


   **[Test build #141045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141045/testReport)** for PR 31522 at commit [`7c87991`](https://github.com/apache/spark/commit/7c87991374e6c3538764c265acd3beba98d4b81e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r587459377



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Gentle ping @cloud-fan What should I do next for this pr?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582440847



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       > `externalCatalog.loadDynamicPartition()` is this really counted as the commit duration in this PR?
   
   Not yet,  we need to collect this duration after job committed. not counted in job commit duration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879780288






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884995774


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46018/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784267752


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39956/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885231030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880466658






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881127826


   ping @cloud-fan @gengliangwang @maropu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-790720569


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40342/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880345162


   **[Test build #141046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141046/testReport)** for PR 31522 at commit [`160e56d`](https://github.com/apache/spark/commit/160e56d91a0ad8379bce30a69a821d8a94418d74).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671443471



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -307,6 +307,7 @@ object FileFormatWriter extends Logging {
           dataWriter.writeWithIterator(iterator)
           dataWriter.commit()
         }
+        description.statsTrackers.foreach(_.updateTaskWriteAndCommitDuration(timeCost))

Review comment:
       > I should have pointed out this earlier: I don't think it makes sense to report the write time, as it can be misleading. Consuming the iterator also includes the time of executing the previous SQL operators.
   > 
   > Can we only track the task commit time?
   
   How about current?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882777962


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883540885


   The test failure seems to be real


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884916961


   > @AngersZhuuuu Thanks. Please update the screenshot in the PR description as well.
   
   DOne


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883626633


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45868/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775008712


   **[Test build #135016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135016/testReport)** for PR 31522 at commit [`d065bcd`](https://github.com/apache/spark/commit/d065bcda8cdb555cf409946dade11b35371ae6be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882545130


   **[Test build #141247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141247/testReport)** for PR 31522 at commit [`cc42403`](https://github.com/apache/spark/commit/cc4240332bcdf1a22fb8b7137fe5f88c23ccde77).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673010505



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -48,7 +48,9 @@ case class BasicWriteTaskStats(
 /**
  * Simple [[WriteTaskStatsTracker]] implementation that produces [[BasicWriteTaskStats]].
  */
-class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
+class BasicWriteTaskStatsTracker(
+    hadoopConf: Configuration,
+    taskCommitTimeMetrics: Option[SQLMetric] = None)

Review comment:
       > when this can be None? test?
   
   Yea, since in some test there is no sc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881827705


   **[Test build #141198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141198/testReport)** for PR 31522 at commit [`5c41947`](https://github.com/apache/spark/commit/5c41947cb6cce62d581d109b9e69dc99454c29eb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793672126


   > How often is that? We can also improve the log to make it easier to search for a certain job.
   
   another case , we run a MSCK of table, in SQL tab it shows nothing.
   ![截屏2021-03-09 下午6 16 20](https://user-images.githubusercontent.com/46485123/110455575-92a36980-8103-11eb-9a3a-565d3d4f7c15.png)
   
   But when it slow, we only can know how long it cost to collect path info in stage page.
   ![截屏2021-03-09 下午6 15 55](https://user-images.githubusercontent.com/46485123/110455524-83bcb700-8103-11eb-82d6-8e36eb962197.png)
   
   But after collect path info and partition statistics. It also need to interact with hive. sometimes it is slow, user will ask why the job finished only cost 2 minutes but SQL's duration is 10 minutes.
   
   These duration metrics also important for such command for Spark admin to quick find the reason and reply to user.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794426247


   I think the key arguments here are "how much time the committing can take at worse case" and "how frequently it occurs".
   
   I have no answer for second one as I could only hear from the customers' when they complained, but I can give the first one according to customers' case. That's not just 10s of seconds of course. (I would rather say they only concern when the gap is "significant", not just a few more mins.) It can be couple of hours or even longer on HDFS unhealthy case. Most likely their complaints on this behavior are "why the Spark driver got hang?", because there's no log during committing, unless they turned on DEBUG log for Hadoop code path.
   
   That said, I have mixed feeling on this. I agree that explaining the missing time range is important when we track back the problem from event log, but assume the commit ended somehow, then the log will tell. I would like to know about the answer of second one in production before making decision, but if the case is not happening often, that might be something we can live with.
   
   And for the extreme case like taking hours on committing, I think more important thing is to log periodically to let end users determine whether the Spark driver is hang or not, without enabling DEBUG log for sure. Maybe off-topic, but if we'd like to have priority on these things, I'd rather say that's more needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885022861


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46018/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r761590742



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##########
@@ -788,6 +792,24 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") {

Review comment:
       I just happened to see the flakiness of the PR here: https://github.com/apache/spark/runs/4378978842
   
   ```
   sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
   	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   	at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$87(SQLMetricsSuite.scala:810)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:305)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:303)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withTable(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$86(SQLMetricsSuite.scala:800)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$85(SQLMetricsSuite.scala:800)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$5(AdaptiveTestUtils.scala:65)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$4(AdaptiveTestUtils.scala:65)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   	at org.scalatest.Transformer.apply(Transformer.scala:22)
   	at org.scalatest.Transformer.apply(Transformer.scala:20)
   	at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
   	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
   	at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
   	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
   	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
   	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
   	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   ```
   
   Seems not super flaky though. I am noting it here in case other people see this failure more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784420343


   **[Test build #135376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135376/testReport)** for PR 31522 at commit [`e334a4c`](https://github.com/apache/spark/commit/e334a4c8c32c0c592ee463292eaa69b0495bc57c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880376871


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782563505


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881702873


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45690/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879780288






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881733051


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45690/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794853587


   > And for the extreme case like taking hours on committing, I think more important thing is to log periodically to let end users determine whether the Spark driver is hang or not, without enabling DEBUG log for sure. Maybe off-topic, but if we'd like to have priority on these things, I'd rather say that's more needed.
   
   Yea,  I think you have point out the most important concern. There is no log when bad case happened. I think this idea is nice , WDYT @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880346078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141046/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564185






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582821212



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       if we show job commit in the UI, shall we also show hive load table?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       if we show job commit duration in the UI, shall we also show hive load table duration?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784322350


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39955/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106


   > We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol.
   
   A new thread to log these can be ok, but looks weird. (We use this way to show thrift server's progress)
   
   > I checked other plan nodes and found that the file scan node has a "metadata time" metrics. I think it makes sense to have something similar in the write nodes, but we need to think about the naming and what to include (shall we include the hive LOAD TABLE time?).
   
   If possible, I think the more comprehensive information the better.  As I mentioned in https://github.com/apache/spark/pull/31522#issuecomment-793672126, if we have `LOAD TABLE` time in SQL Tab's node, it will be easier for us to explain to our user/customer.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883302646


   **[Test build #141328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141328/testReport)** for PR 31522 at commit [`d3389c6`](https://github.com/apache/spark/commit/d3389c67298cb56a3e61bfea52a130365a5307dd).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564182






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883372146


   **[Test build #141334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141334/testReport)** for PR 31522 at commit [`a4890f2`](https://github.com/apache/spark/commit/a4890f2049050af537c6760fa8e94ba237b9e875).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883529499


   **[Test build #141337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141337/testReport)** for PR 31522 at commit [`6d91e25`](https://github.com/apache/spark/commit/6d91e2516c1b2ca2ecb1bf7aa101500dc4cc529e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884495427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141422/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #31522:
URL: https://github.com/apache/spark/pull/31522


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582049509



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Some times  after task all completed, we wait a long time then job finished, it's always cost on job commit and  metadata handling such as `externalCatalog.loadDynamicPartition()` etc.
    
   These duration information is important when it's slow  when we want to compare job's performance. Since when hdfs is unstable, file operation will cost  more time. 
   
   Also I want to add metrics about metadata handling time after job committed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784277157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39956/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884458991


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882534348


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45761/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782591827


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135289/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775885784


   **[Test build #135066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135066/testReport)** for PR 31522 at commit [`89f8201`](https://github.com/apache/spark/commit/89f8201ab6f2069d03aefb8d3184017556858453).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `trait UserDefinedExpression `
     * `case class SubqueryAdaptiveBroadcastExec(`
     * `case class PlanAdaptiveDynamicPruningFilters(`
     * `case class PlanAdaptiveSubqueries(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880376871


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880376860


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884900802


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141495/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r761590742



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##########
@@ -788,6 +792,24 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") {

Review comment:
       I just happened to see the flakiness of the PR here:
   
   ```
   sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
   	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   	at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$87(SQLMetricsSuite.scala:810)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:305)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:303)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withTable(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$86(SQLMetricsSuite.scala:800)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$85(SQLMetricsSuite.scala:800)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$5(AdaptiveTestUtils.scala:65)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
   	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
   	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
   	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$4(AdaptiveTestUtils.scala:65)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   	at org.scalatest.Transformer.apply(Transformer.scala:22)
   	at org.scalatest.Transformer.apply(Transformer.scala:20)
   	at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
   	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
   	at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
   	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
   	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
   	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
   	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   ```
   
   Seems not super flaky though. I am noting it here in case other people see this failure more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784217619


   **[Test build #135375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135375/testReport)** for PR 31522 at commit [`6bf5a88`](https://github.com/apache/spark/commit/6bf5a880f6a91baf3e55440b271ca3d0008493fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883747132


   **[Test build #141351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141351/testReport)** for PR 31522 at commit [`7b2bb06`](https://github.com/apache/spark/commit/7b2bb0639d2b9b7f6f9f16acd1df214b91e6c5ec).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class CustomFileCommitProtocol(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882567027


   **[Test build #141261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141261/testReport)** for PR 31522 at commit [`cc42403`](https://github.com/apache/spark/commit/cc4240332bcdf1a22fb8b7137fe5f88c23ccde77).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879777218


   **[Test build #141010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141010/testReport)** for PR 31522 at commit [`dc78903`](https://github.com/apache/spark/commit/dc789035296adc340683e48da6e20440692b8d22).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880437311


   **[Test build #141056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141056/testReport)** for PR 31522 at commit [`e69e279`](https://github.com/apache/spark/commit/e69e279f969758f442c41fec70189097514bb936).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-783854398


   > The file commit is a driver side thing, why do we need to update `BasicWriteJobStatsTracker`? I think we can follow `BroadcastExchangeExec` and simply call `SQLMetrics.postDriverMetricUpdates`
   
   Since we compute WritingCommand's metrics in driver side and all metrics stored in `BasicWriteJobStatsTracker `, so I changed `BasicWriteJobStatsTracker`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881764671


   **[Test build #141178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141178/testReport)** for PR 31522 at commit [`559d766`](https://github.com/apache/spark/commit/559d7665ef58683bca3121294f552622626838fc).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class BasicWriteTaskStatsTracker(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775725455


   Gentle ping @dongjoon-hyun @HeartSaVioR @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784746641


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39978/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880466840


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45571/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775071600


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39599/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782557635


   **[Test build #135289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135289/testReport)** for PR 31522 at commit [`9ddd28c`](https://github.com/apache/spark/commit/9ddd28cabb5c5d44366a109b4b1c7c3ce88b45e1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883540183


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880370796


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885172499


   **[Test build #141517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141517/testReport)** for PR 31522 at commit [`b5c9d63`](https://github.com/apache/spark/commit/b5c9d6338c53c52724c0a21d61a093194745b98f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880444654


   **[Test build #141058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141058/testReport)** for PR 31522 at commit [`b92f0e4`](https://github.com/apache/spark/commit/b92f0e4b79b1bbfede3f22cddce25c1c122a19c9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784277157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39956/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671088197



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -103,5 +103,7 @@ trait WriteJobStatsTracker extends Serializable {
    * to the expected derived type when implementing this method in a derived class.
    * The framework will make sure to call this with the right arguments.
    */
-  def processStats(stats: Seq[WriteTaskStats]): Unit
+  def processStats(stats: Seq[WriteTaskStats], jobCommitDuration: Long): Unit
+
+  def updateTaskWriteAndCommitDuration(duration: Long): Unit

Review comment:
       e.g. `WriteTaskStatsTracker.getFinalStats(taskCommitTime: Long)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784416184


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135375/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582831850



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       > if we show job commit duration in the UI, shall we also show hive load table duration?
   
   I originally planned to show hive load table duration after this PR. Shall I also update about hive load table duration in this PR?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784245582


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39955/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885312283


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141517/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881646025


   **[Test build #141178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141178/testReport)** for PR 31522 at commit [`559d766`](https://github.com/apache/spark/commit/559d7665ef58683bca3121294f552622626838fc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881803662


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45706/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884494008


   **[Test build #141422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141422/testReport)** for PR 31522 at commit [`8107d20`](https://github.com/apache/spark/commit/8107d203073213f5104612db2e51859874b8a762).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884896556


   **[Test build #141495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141495/testReport)** for PR 31522 at commit [`b5c9d63`](https://github.com/apache/spark/commit/b5c9d6338c53c52724c0a21d61a093194745b98f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880437328


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141056/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784725644


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775898417


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135066/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881827705


   **[Test build #141198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141198/testReport)** for PR 31522 at commit [`5c41947`](https://github.com/apache/spark/commit/5c41947cb6cce62d581d109b9e69dc99454c29eb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r674047852



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##########
@@ -788,6 +792,28 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") {
+    withSQLConf(SQLConf.FILE_COMMIT_PROTOCOL_CLASS.key ->
+      "org.apache.spark.sql.execution.metric.CustomFileCommitProtocol") {
+      withTable("t", "t2") {
+        sql("CREATE TABLE t(id STRING) USING PARQUET")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("CREATE TABLE t2(id STRING) USING PARQUET")
+        val df = sql("INSERT INTO TABLE t2 SELECT * FROM  t")

Review comment:
       can we simplify the test and just write `INSERT INTO TABLE t SELECT 'abc'`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880346068


   **[Test build #141046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141046/testReport)** for PR 31522 at commit [`160e56d`](https://github.com/apache/spark/commit/160e56d91a0ad8379bce30a69a821d8a94418d74).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884447490


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-790678574


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40342/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775725455


   Gentle ping @dongjoon-hyun @HeartSaVioR @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881718515


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45690/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884917062


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883412595


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881772710


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141178/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673010505



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -48,7 +48,9 @@ case class BasicWriteTaskStats(
 /**
  * Simple [[WriteTaskStatsTracker]] implementation that produces [[BasicWriteTaskStats]].
  */
-class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
+class BasicWriteTaskStatsTracker(
+    hadoopConf: Configuration,
+    taskCommitTimeMetrics: Option[SQLMetric] = None)

Review comment:
       > when this can be None? test?
   
   Yea, since in some test there is no sc.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {
+    val sparkContext = SparkContext.getActive.get
+    Map(TASK_COMMIT_DURATION ->
+      SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"))

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -93,6 +94,7 @@ trait WriteJobStatsTracker extends Serializable {
    * Process the given collection of stats computed during this job.
    * E.g. aggregate them, write them to memory / disk, issue warnings, whatever.
    * @param stats One [[WriteTaskStats]] object from each successful write task.
+   * @param jobCommitDuration Duration of job commit.

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -66,10 +66,11 @@ trait WriteTaskStatsTracker {
 
   /**
    * Returns the final statistics computed so far.
+   * @param taskCommitTime The task commit duration.

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -302,13 +302,9 @@ object FileFormatWriter extends Logging {
       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
         // Execute the task to write rows out and commit the task.
         val taskAttemptID = taskAttemptContext.getTaskAttemptID
-        val (res, timeCost) = Utils.timeTakenMs {
-          logDebug("$taskAttemptID starts to write and commit.")
-          dataWriter.writeWithIterator(iterator)
-          dataWriter.commit()
-        }
-        logInfo(s"$taskAttemptID finished to write and commit. Elapsed time: $timeCost ms.")
-        res
+        logDebug(s"$taskAttemptID starts to write and commit.")

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##########
@@ -91,11 +91,14 @@ abstract class FileFormatDataWriter(
    * driver too and used to e.g. update the metrics in UI.
    */
   override def commit(): WriteTaskResult = {
-    releaseResources()
+    val (taskCommitMessage, taskCommitTime) = Utils.timeTakenMs {
+      releaseResources()

Review comment:
       yea, fixed

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883584064


   **[Test build #141351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141351/testReport)** for PR 31522 at commit [`7b2bb06`](https://github.com/apache/spark/commit/7b2bb0639d2b9b7f6f9f16acd1df214b91e6c5ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881868586


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141198/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581572822



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -232,7 +234,9 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      DURATION_JOB_COMMIT->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of committing the job")

Review comment:
       > Can you add some tests somewhere, e.g., `SQLMetricsSuite`?
   
   Yea, UT added in `SQLMetricsSuite`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784712173


   **[Test build #135398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135398/testReport)** for PR 31522 at commit [`2cc84df`](https://github.com/apache/spark/commit/2cc84dfc1c91dcd7d78bd73127bd53f99744c383).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775754322


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39648/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883372146


   **[Test build #141334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141334/testReport)** for PR 31522 at commit [`a4890f2`](https://github.com/apache/spark/commit/a4890f2049050af537c6760fa8e94ba237b9e875).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564182






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882650249


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45775/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884900802


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141495/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784421426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135376/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884273426


   **[Test build #141422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141422/testReport)** for PR 31522 at commit [`8107d20`](https://github.com/apache/spark/commit/8107d203073213f5104612db2e51859874b8a762).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884341958


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu edited a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu edited a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106


   > We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol.
   
   A new thread to log these can be ok, but looks weird. (We use this way to show thrift server's progress)
   
   > I checked other plan nodes and found that the file scan node has a "metadata time" metrics. I think it makes sense to have something similar in the write nodes, but we need to think about the naming and what to include (shall we include the hive LOAD TABLE time?).
   
   If possible, I think the more comprehensive information the better.  As I mentioned in https://github.com/apache/spark/pull/31522#issuecomment-793672126, if we have `LOAD TABLE` time in SQL Tab's node, it will be easier for us to explain to our user/customer.
   If we can do this,  we can show this duration with all similar commands.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793765772


   yea I agree this duration info is important, but not sure if SQL UI is the best place for it. cc @gengliangwang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793499270


   How often is that? We can also improve the log to make it easier to search for a certain job.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885198800


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880612233


   **[Test build #141058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141058/testReport)** for PR 31522 at commit [`b92f0e4`](https://github.com/apache/spark/commit/b92f0e4b79b1bbfede3f22cddce25c1c122a19c9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r582067232



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -221,7 +221,7 @@ object FileFormatWriter extends Logging {
       val (_, duration) = Utils.timeTakenMs { committer.commitJob(job, commitMsgs) }
       logInfo(s"Write Job ${description.uuid} committed. Elapsed time: $duration ms.")
 
-      processStats(description.statsTrackers, ret.map(_.summary.stats))
+      processStats(description.statsTrackers, ret.map(_.summary.stats), duration)

Review comment:
       Normally, this should be fast. I don't think it is a good idea to put this in the SQL graph.
   Besides, the name "duration of committing the job" can be confusing to end-users.
   I have to leave -1 for this one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784416184


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135375/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r581037561



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -232,7 +234,8 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      DURATION_FILE_COMMIT-> SQLMetrics.createTimingMetric(sparkContext, "duration of commit files")

Review comment:
       > duration of committing the job
   
   Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -223,6 +224,7 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  private val DURATION_FILE_COMMIT = "durationCommit"

Review comment:
       > jobCommitDuration?
   
   Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -192,7 +192,7 @@ class BasicWriteJobStatsTracker(
     new BasicWriteTaskStatsTracker(serializableHadoopConf.value)
   }
 
-  override def processStats(stats: Seq[WriteTaskStats]): Unit = {
+  override def processStats(stats: Seq[WriteTaskStats], duration: Long): Unit = {

Review comment:
       > `duration` -> `jobCommitDuration` to make it clear.
   
   Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564185






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673100605



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +186,15 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    @transient val driverSideMetrics: Map[String, SQLMetric],
+    taskCommitTimeMetric: SQLMetric)
   extends WriteJobStatsTracker {

Review comment:
       nit: seems we can simplify the caller side if we add a new constructor in this class
   ```
   def this(
       serializableHadoopConf: SerializableConfiguration,
       metrics: Map[String, SQLMetric]): WriteJobStatsTracker {
     this(serializableHadoopConf, metrics - TASK_COMMIT_TIME, metrics(TASK_COMMIT_TIME))
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880444654


   **[Test build #141058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141058/testReport)** for PR 31522 at commit [`b92f0e4`](https://github.com/apache/spark/commit/b92f0e4b79b1bbfede3f22cddce25c1c122a19c9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793484750


   In fact, the cluster environment of many companies is not so healthy, and there are often slow nodes that cause the commit and hive metadata load table/partition to be very slow. We can indeed view it through the log, but for long-running service, especially the Spark Thrift Server, we have a lot of SQL running on it, we also need to go to the background log to find and confirm which SQL the log belongs to. Under normal circumstances, our SQL runs for a long time or there is a problem then we will to view these metrics information. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880506609


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880436106


   **[Test build #141056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141056/testReport)** for PR 31522 at commit [`e69e279`](https://github.com/apache/spark/commit/e69e279f969758f442c41fec70189097514bb936).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775791223


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39648/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882757678


   **[Test build #141261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141261/testReport)** for PR 31522 at commit [`cc42403`](https://github.com/apache/spark/commit/cc4240332bcdf1a22fb8b7137fe5f88c23ccde77).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881813626


   **[Test build #141194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141194/testReport)** for PR 31522 at commit [`ee2e3cf`](https://github.com/apache/spark/commit/ee2e3cfb5b8eb6cf207f752845d02e7d26b9ff37).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883300897


   **[Test build #141328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141328/testReport)** for PR 31522 at commit [`d3389c6`](https://github.com/apache/spark/commit/d3389c67298cb56a3e61bfea52a130365a5307dd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884416323


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r673035862



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {
+    val sparkContext = SparkContext.getActive.get
+    Map(TASK_COMMIT_DURATION ->
+      SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"))

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -93,6 +94,7 @@ trait WriteJobStatsTracker extends Serializable {
    * Process the given collection of stats computed during this job.
    * E.g. aggregate them, write them to memory / disk, issue warnings, whatever.
    * @param stats One [[WriteTaskStats]] object from each successful write task.
+   * @param jobCommitDuration Duration of job commit.

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -66,10 +66,11 @@ trait WriteTaskStatsTracker {
 
   /**
    * Returns the final statistics computed so far.
+   * @param taskCommitTime The task commit duration.

Review comment:
       Done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -302,13 +302,9 @@ object FileFormatWriter extends Logging {
       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
         // Execute the task to write rows out and commit the task.
         val taskAttemptID = taskAttemptContext.getTaskAttemptID
-        val (res, timeCost) = Utils.timeTakenMs {
-          logDebug("$taskAttemptID starts to write and commit.")
-          dataWriter.writeWithIterator(iterator)
-          dataWriter.commit()
-        }
-        logInfo(s"$taskAttemptID finished to write and commit. Elapsed time: $timeCost ms.")
-        res
+        logDebug(s"$taskAttemptID starts to write and commit.")

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r674768282



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -230,7 +244,10 @@ object BasicWriteJobStatsTracker {
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      TASK_COMMIT_TIME ->
+        SQLMetrics.createTimingMetric(sparkContext, "time of committing the tasks"),

Review comment:
       Let's make it `task commit time` for short




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881826398


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883513311


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885231030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882700924


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45775/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885219832


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r671532257



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -69,7 +69,7 @@ trait WriteTaskStatsTracker {
    * @note This may only be called once. Further use of the object may lead to undefined behavior.
    * @return An object of subtype of [[WriteTaskStats]], to be sent to the driver.
    */
-  def getFinalStats(): WriteTaskStats
+  def getFinalStats(taskCommitTime: Long): WriteTaskStats

Review comment:
       Could you update the function description by adding `@param taskCommitTime`, please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880370943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-880466658


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141045/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882564185






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-885006166


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883405437


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45846/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r674065629



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##########
@@ -788,6 +792,28 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") {
+    withSQLConf(SQLConf.FILE_COMMIT_PROTOCOL_CLASS.key ->
+      "org.apache.spark.sql.execution.metric.CustomFileCommitProtocol") {
+      withTable("t", "t2") {
+        sql("CREATE TABLE t(id STRING) USING PARQUET")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("INSERT INTO TABLE t SELECT 'abc'")
+        sql("CREATE TABLE t2(id STRING) USING PARQUET")
+        val df = sql("INSERT INTO TABLE t2 SELECT * FROM  t")

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884495427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141422/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31522:
URL: https://github.com/apache/spark/pull/31522#discussion_r672105036



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -183,14 +186,16 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
  */
 class BasicWriteJobStatsTracker(
     serializableHadoopConf: SerializableConfiguration,
-    @transient val metrics: Map[String, SQLMetric])
+    val metrics: Map[String, SQLMetric])

Review comment:
       how about
   ```
   @transient val driverSideMetrics: Map...
   taskCommitTimeMetrics: SQLMetric
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -48,7 +48,9 @@ case class BasicWriteTaskStats(
 /**
  * Simple [[WriteTaskStatsTracker]] implementation that produces [[BasicWriteTaskStats]].
  */
-class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
+class BasicWriteTaskStatsTracker(
+    hadoopConf: Configuration,
+    taskCommitTimeMetrics: Option[SQLMetric] = None)

Review comment:
       when this can be None? test?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {

Review comment:
       why Map? `def taskCommitTimeMetric: SQLMetric ...`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {

Review comment:
       `def taskCommitTimeMetric: (String, SQLMetric)`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##########
@@ -91,11 +91,14 @@ abstract class FileFormatDataWriter(
    * driver too and used to e.g. update the metrics in UI.
    */
   override def commit(): WriteTaskResult = {
-    releaseResources()
+    val (taskCommitMessage, taskCommitTime) = Utils.timeTakenMs {
+      releaseResources()

Review comment:
       shall we include `releaseResources` in the commit time?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -302,13 +302,9 @@ object FileFormatWriter extends Logging {
       Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
         // Execute the task to write rows out and commit the task.
         val taskAttemptID = taskAttemptContext.getTaskAttemptID
-        val (res, timeCost) = Utils.timeTakenMs {
-          logDebug("$taskAttemptID starts to write and commit.")
-          dataWriter.writeWithIterator(iterator)
-          dataWriter.commit()
-        }
-        logInfo(s"$taskAttemptID finished to write and commit. Elapsed time: $timeCost ms.")
-        res
+        logDebug(s"$taskAttemptID starts to write and commit.")

Review comment:
       This log doesn't exist before https://github.com/apache/spark/commit/f5a63322def87904ebfa95673a584c094e6062cc#diff-03e7ff19e93cb270d82aca907ac0a1f87463ba6eb5dce78a407bb169b840a6cb , shall we remove it?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -66,10 +66,11 @@ trait WriteTaskStatsTracker {
 
   /**
    * Returns the final statistics computed so far.
+   * @param taskCommitTime The task commit duration.

Review comment:
       nit: `Time of committing the task`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala
##########
@@ -93,6 +94,7 @@ trait WriteJobStatsTracker extends Serializable {
    * Process the given collection of stats computed during this job.
    * E.g. aggregate them, write them to memory / disk, issue warnings, whatever.
    * @param stats One [[WriteTaskStats]] object from each successful write task.
+   * @param jobCommitDuration Duration of job commit.

Review comment:
       ditto

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"

Review comment:
       `taskCommitTime` and `jobCommitTime`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")

Review comment:
       `time of committing the job`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##########
@@ -221,16 +226,26 @@ object BasicWriteJobStatsTracker {
   private val NUM_OUTPUT_BYTES_KEY = "numOutputBytes"
   private val NUM_OUTPUT_ROWS_KEY = "numOutputRows"
   private val NUM_PARTS_KEY = "numParts"
+  val TASK_COMMIT_DURATION = "taskCommitDuration"
+  private val JOB_COMMIT_DURATION = "jobCommitDuration"
   /** XAttr key of the data length header added in HADOOP-17414. */
   val FILE_LENGTH_XATTR = "header.x-hadoop-s3a-magic-data-length"
 
-  def metrics: Map[String, SQLMetric] = {
+  def driverSideMetrics: Map[String, SQLMetric] = {
     val sparkContext = SparkContext.getActive.get
     Map(
       NUM_FILES_KEY -> SQLMetrics.createMetric(sparkContext, "number of written files"),
       NUM_OUTPUT_BYTES_KEY -> SQLMetrics.createSizeMetric(sparkContext, "written output"),
       NUM_OUTPUT_ROWS_KEY -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
-      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
+      NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
+      JOB_COMMIT_DURATION ->
+        SQLMetrics.createTimingMetric(sparkContext, "duration of job commit")
     )
   }
+
+  def taskCommitTimeMetric: Map[String, SQLMetric] = {
+    val sparkContext = SparkContext.getActive.get
+    Map(TASK_COMMIT_DURATION ->
+      SQLMetrics.createTimingMetric(sparkContext, "duration of task commit"))

Review comment:
       `time of committing tasks`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881843393


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775180762


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135016/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-790645403


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135759/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879705943


   **[Test build #141010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141010/testReport)** for PR 31522 at commit [`dc78903`](https://github.com/apache/spark/commit/dc789035296adc340683e48da6e20440692b8d22).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884891423


   @AngersZhuuuu Thanks. Please update the screenshot in the PR description as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883412595


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879735838


   After offline discussion with @gatorsmile @maryannxue and @cloud-fan this morning, I am taking back my -1 on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784725644


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-879745399


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775180762


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135016/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-790645403


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135759/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881868586


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141198/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784414993


   **[Test build #135375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135375/testReport)** for PR 31522 at commit [`6bf5a88`](https://github.com/apache/spark/commit/6bf5a880f6a91baf3e55440b271ca3d0008493fc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775008712


   **[Test build #135016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135016/testReport)** for PR 31522 at commit [`d065bcd`](https://github.com/apache/spark/commit/d065bcda8cdb555cf409946dade11b35371ae6be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884968395


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-784322350


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39955/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881843393


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883757348


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883412375


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45843/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org