You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Baohe Zhang (Jira)" <ji...@apache.org> on 2021/03/23 21:44:00 UTC

[jira] [Created] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics shouldn't return partial metrics when some of child pids metrics are missing

Baohe Zhang created SPARK-34845:
-----------------------------------

             Summary: ProcfsMetricsGetter.computeAllMetrics shouldn't return partial metrics when some of child pids metrics are missing
                 Key: SPARK-34845
                 URL: https://issues.apache.org/jira/browse/SPARK-34845
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.1.1, 3.1.0, 3.0.2, 3.0.1, 3.0.0
            Reporter: Baohe Zhang


When the procfs metrics of some child pids are unavailable, ProcfsMetricsGetter.computeAllMetrics() returns partial metrics (the sum of a subset of child pids), instead of an all 0 result. This can be misleading and is undesired per the current code comments in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala#L214].

 

Also, a side effect of it is that it can lead to a verbose warning log if many pids' stat files are missing. Also, a side effect of it is that it can lead to verbose warning logs if many pids' stat files are missing.
{noformat}
e.g.2021-03-21 16:58:25,422 [pool-26-thread-8] WARN  org.apache.spark.executor.ProcfsMetricsGetter  - There was a problem with reading the stat file of the process. java.io.FileNotFoundException: /proc/742/stat (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:203) at org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:205) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2540) at org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:205) at org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:297){noformat}
The issue can be fixed by updating the flag isAvailable to false when one of the child pid's procfs metric is unavailable. Other methods computePid, computePageSize, and getChildPids already have this behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org