You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:37:14 UTC

[GitHub] [spark] uncleGen commented on a change in pull request #29269: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations

uncleGen commented on a change in pull request #29269:
URL: https://github.com/apache/spark/pull/29269#discussion_r462069563



##########
File path: docs/web-ui.md
##########
@@ -426,9 +426,9 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various operations in milliseconds.
 The tracked operations are listed as follows.
-    * addBatch: Adds result data of the current batch to the sink.
-    * getBatch: Gets a new batch of data to process.
-    * latestOffset: Gets the latest offsets for sources. 
+    * addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time.
+    * getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources.
+    * latestOffset: Time taken to query the sources whether they have new input data.

Review comment:
       +1

##########
File path: docs/web-ui.md
##########
@@ -426,9 +426,9 @@ queries. Currently, it contains the following metrics.
 * **Batch Duration.** The process duration of each batch. 
 * **Operation Duration.** The amount of time taken to perform various operations in milliseconds.
 The tracked operations are listed as follows.
-    * addBatch: Adds result data of the current batch to the sink.
-    * getBatch: Gets a new batch of data to process.
-    * latestOffset: Gets the latest offsets for sources. 
+    * addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time.
+    * getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources.
+    * latestOffset: Time taken to query the sources whether they have new input data.
     * queryPlanning: Generates the execution plan.
     * walCommit: Writes the offsets to the metadata log.

Review comment:
       nit: do they also need be explained in pattern 'Time taken to ...'




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org