You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/17 17:11:10 UTC

[GitHub] [spark] warrenzhu25 opened a new pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

warrenzhu25 opened a new pull request #31869:
URL: https://github.com/apache/spark/pull/31869


   ### What changes were proposed in this pull request?
   Determine whether show input/output size and records based on either has value, rather than only size.
   
   ### Why are the changes needed?
   Stage page UI not show input/output size and records even when records greater than zero. This is common when spark streaming job read from kafka source.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manually
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 edited a comment on pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 edited a comment on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-813712363


   @HeartSaVioR, @Ngone51 Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-801285671


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r598044400



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       I think root cause is as below, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala#L96:
   ```
   private class MetricsHandler extends Logging with Serializable {
     private val inputMetrics = TaskContext.get().taskMetrics().inputMetrics
     private val startingBytesRead = inputMetrics.bytesRead
     private val getBytesRead = SparkHadoopUtil.get.getFSBytesReadOnThreadCallback()
   
     def updateMetrics(numRows: Int, force: Boolean = false): Unit = {
       inputMetrics.incRecordsRead(numRows)
       val shouldUpdateBytesRead =
         inputMetrics.recordsRead % SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0
       if (shouldUpdateBytesRead || force) {
         inputMetrics.setBytesRead(startingBytesRead + getBytesRead())
       }
     }
   }
   ```
   `DataSourceRDD` only supports updating records from hadoop read




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-1034591698


   @HeartSaVioR, @Ngone51 Could you help take a look and remove `stale` label?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-917298576


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-813712363


   @HeartSaVioR Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31869:
URL: https://github.com/apache/spark/pull/31869


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r596858929



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       Does it mean we didn't count the correct metrics somewhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on pull request #31869:
URL: https://github.com/apache/spark/pull/31869#issuecomment-922034051


   @HeartSaVioR, @Ngone51 Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r599630386



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       I see. Is it possible to add a UT for the change? If not, could you paste the screenshot for the UI change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r597417040



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       Yes, I think so. Anyway, it's better to determine based on either conditions as some input source might only have one metrics. I'll try to figure out why kafka source has such metrics in separate PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r597512159



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       If we're sure that it's because some resources do not provide certain metrics rather than they counted incorrectly, then, I think the current fix should be fine. Otherwise, I think we should fix the counting of metrics instead(as it's a bug).
   Good to see to figure out the root cause first.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input/output size records not show when records greater than zero

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r606519888



##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value != null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       Added screenshot




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org