You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arsenii Venherak (Jira)" <ji...@apache.org> on 2020/03/13 15:36:00 UTC

[jira] [Commented] (SPARK-31149) PySpark job not killing Spark Daemon processes after the executor is killed due to OOM

    [ https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058853#comment-17058853 ] 

Arsenii Venherak commented on SPARK-31149:
------------------------------------------

Created PR https://github.com/apache/spark/pull/27903

> PySpark job not killing Spark Daemon processes after the executor is killed due to OOM
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31149
>                 URL: https://issues.apache.org/jira/browse/SPARK-31149
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.5
>            Reporter: Arsenii Venherak
>            Priority: Major
>             Fix For: 2.4.5
>
>
> {code:java}
> 2020-03-10 10:15:00,257 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 GB virtual memory used
> 2020-03-10 10:15:05,135 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used
> 2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_e25_1583485217113_0347_01_000042
>  has processes older than 1 iteration running over the configured limit. Limit=2147483648, current usage = 3915513856
> 2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
> 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing container.
> Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
>         |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java -server -Xmx1024m -Djava.io.tmpdir=/data/s
> cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
> e.enableSaslEncryption=true -Dspark.driver.port=40653 -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
> .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
> 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://CoarseGrainedScheduler@bd02slse0201.wellsfargo.com:40653 --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id application_1583485217113_0347 --user-class-path
> file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
> {code}
>  
>  
> After that, there are lots of pyspark.daemon process left.
>  eg:
>  /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org