You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by zjzzjz <ji...@gmail.com> on 2019/03/10 01:14:29 UTC

How to know if a machine in a Spark cluster 'participate's a job

I wanted to know when it is safe to remove a node from a machine from a
cluster.

My assumption is that it could be safe to remove a machine if the machine
does not have any containers, and it does not store any useful data.

By the APIs at
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html,
we can do

 GET http://<rm http address:port>/ws/v1/cluster/nodes
to get the information of each node like

<node>
    <rack>/default-rack</rack>
    <state>RUNNING</state>
    <id>host1.domain.com:54158</id>
    <nodeHostName>host1.domain.com</nodeHostName>
    <nodeHTTPAddress>host1.domain.com:8042</nodeHTTPAddress>
    <lastHealthUpdate>1476995346399</lastHealthUpdate>
    <version>3.0.0-SNAPSHOT</version>
    <healthReport></healthReport>
    <numContainers>0</numContainers>
    <usedMemoryMB>0</usedMemoryMB>
    <availMemoryMB>8192</availMemoryMB>
    <usedVirtualCores>0</usedVirtualCores>
    <availableVirtualCores>8</availableVirtualCores>
    <resourceUtilization>
        <nodePhysicalMemoryMB>1027</nodePhysicalMemoryMB>
        <nodeVirtualMemoryMB>1027</nodeVirtualMemoryMB>
        <nodeCPUUsage>0.006664445623755455</nodeCPUUsage>
       
<aggregatedContainersPhysicalMemoryMB>0</aggregatedContainersPhysicalMemoryMB>
       
<aggregatedContainersVirtualMemoryMB>0</aggregatedContainersVirtualMemoryMB>
        <containersCPUUsage>0.0</containersCPUUsage>
    </resourceUtilization>
  </node>
If numContainers is 0, I assume it does not run containers. However can it
still store any data on disk that other downstream tasks can read?

I did not get if Spark lets us know this. I assume if a machine still stores
some data useful for the running job, the machine may maintain a heart beat
with Spark Driver or some central controller? Can we check this by scanning
tcp or udp connections?

Is there any other way to check if a machine in a Spark cluster participates
a job?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org