You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arsenii Venherak (Jira)" <ji...@apache.org> on 2020/03/13 15:28:00 UTC

[jira] [Created] (SPARK-31149) PySpark job not killing Spark Daemon processes after the executor is killed due to OOM

Arsenii Venherak created SPARK-31149:
----------------------------------------

             Summary: PySpark job not killing Spark Daemon processes after the executor is killed due to OOM
                 Key: SPARK-31149
                 URL: https://issues.apache.org/jira/browse/SPARK-31149
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.5
            Reporter: Arsenii Venherak
             Fix For: 2.4.5


{color:#172b4d}{{{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color} {color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}00{color},{color:#0052cc}257{color} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}: {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} {color:#0052cc}327523{color} {color:#0052cc}for{color} container-id container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: {color:#0052cc}1.9{color} GB of {color:#0052cc}2{color} GB physical memory used; {color:#0052cc}39.5{color} GB of {color:#0052cc}4.2{color} GB virtual memory used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color} {color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}135{color} INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}: {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} {color:#0052cc}327523{color} {color:#0052cc}for{color} container-id container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: {color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual memory used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color} {color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color} WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}: {color:#6554c0}Process{color} tree {color:#0052cc}for{color} container: container_e25_1583485217113_0347_01_000042
 has processes older than {color:#0052cc}1{color} iteration running over the configured limit. {color:#6554c0}Limit{color}={color:#0052cc}2147483648{color}, current usage = {color:#0052cc}3915513856{color}{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color} {color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color} WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}: {color:#6554c0}Container{color} [pid={color:#0052cc}327523{color},containerID=container_e25_1583485217113_0347_01_{color:#0052cc}000042{color}] is running beyond physical memory limits. {color:#6554c0}Current{color} usage: {color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual memory used. {color:#6554c0}Killing{color} container.{color:#6554c0}Dump{color} of the process-tree {color:#0052cc}for{color} container_e25_1583485217113_0347_01_000042 :|- {color:#0052cc}327535{color} {color:#0052cc}327523{color} {color:#0052cc}327523{color} {color:#0052cc}327523{color} (java) {color:#0052cc}1611{color} {color:#0052cc}111{color} {color:#0052cc}4044427264{color} {color:#0052cc}172306{color} /usr/lib/jvm/java-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}-openjdk-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}{color:#0052cc}.242{color}.b08-{color:#0052cc}0.{color}el7_7.x86_64/jre/bin/java -server -{color:#6554c0}Xmx1024m{color} -{color:#6554c0}Djava{color}.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp -{color:#6554c0}Dspark{color}.ssl.trustStore=/opt/mapr/conf/ssl_truststore -{color:#6554c0}Dspark{color}.authenticat
e.enableSaslEncryption=true -{color:#6554c0}Dspark{color}.driver.port={color:#0052cc}40653{color} -{color:#6554c0}Dspark{color}.network.timeout={color:#0052cc}7200{color} -{color:#6554c0}Dspark{color}.ssl.keyStore=/opt/mapr/conf/ssl_keystore -{color:#6554c0}Dspark{color}.network.sasl.serverAlwaysEncrypt=true -{color:#6554c0}Dspark{color}.ssl.enabled=true -{color:#6554c0}Dspark{color}.ssl.protocol={color:#6554c0}TLSv1{color}{color:#0052cc}.2{color} -{color:#6554c0}Dspark{color}.ssl.fs.enabled=true -{color:#6554c0}Dspark{color}.ssl.ui.enabled=false -{color:#6554c0}Dspark{color}.authenticate=true -{color:#6554c0}Dspark{color}.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-{color:#0052cc}2.7{color}.{color:#0052cc}0{color}/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042 -XX:{color:#6554c0}OnOutOfMemoryError{color}=kill %p org.apache.spark.executor.{color:#6554c0}CoarseGrainedExecutorBackend{color} --driver-url
spark://{color:#6554c0}CoarseGrainedScheduler{color}@bd02slse0201.wellsfargo.com:{color:#0052cc}40653{color} --executor-id {color:#0052cc}40{color} --hostname bd02slsc0519.wellsfargo.com --cores {color:#0052cc}1{color} --app-id application_1583485217113_0347 --user-{color:#0052cc}class{color}-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar}}{color}
 



After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org