You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Baohe Zhang (Jira)" <ji...@apache.org> on 2020/12/24 20:16:00 UTC

[jira] [Created] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

Baohe Zhang created SPARK-33906:
-----------------------------------

             Summary: SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
                 Key: SPARK-33906
                 URL: https://issues.apache.org/jira/browse/SPARK-33906
             Project: Spark
          Issue Type: Bug
          Components: Web UI
    Affects Versions: 3.2.0
            Reporter: Baohe Zhang


How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 100000, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get stuck at this page: 

!image-2020-12-24-14-12-22-983.png!

Also the return JSON of REST API endpoint http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
    "JVMHeapMemory" : 135021152,
    "JVMOffHeapMemory" : 149558576,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 3301,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 3301,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 67963178,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 15,
    "MinorGCTime" : 101,
    "MajorGCCount" : 0,
    "MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
    "stdout" : "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout",
    "stderr" : "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr"
  },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to None in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345. The possible reason for returning the empty map is that the stage completion time is shorter than the heartbeat interval, so the stage entry in stageTCMP has already been removed before the reportHeartbeat is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org