You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arsenii Venherak (Jira)" <ji...@apache.org> on 2020/03/13 15:36:00 UTC
[jira] [Commented] (SPARK-31149) PySpark job not killing Spark
Daemon processes after the executor is killed due to OOM
[ https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058853#comment-17058853 ]
Arsenii Venherak commented on SPARK-31149:
------------------------------------------
Created PR https://github.com/apache/spark/pull/27903
> PySpark job not killing Spark Daemon processes after the executor is killed due to OOM
> --------------------------------------------------------------------------------------
>
> Key: SPARK-31149
> URL: https://issues.apache.org/jira/browse/SPARK-31149
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.5
> Reporter: Arsenii Venherak
> Priority: Major
> Fix For: 2.4.5
>
>
> {code:java}
> 2020-03-10 10:15:00,257 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 GB virtual memory used
> 2020-03-10 10:15:05,135 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used
> 2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_e25_1583485217113_0347_01_000042
> has processes older than 1 iteration running over the configured limit. Limit=2147483648, current usage = 3915513856
> 2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
> 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing container.
> Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
> |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java -server -Xmx1024m -Djava.io.tmpdir=/data/s
> cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
> e.enableSaslEncryption=true -Dspark.driver.port=40653 -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
> .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
> 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://CoarseGrainedScheduler@bd02slse0201.wellsfargo.com:40653 --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id application_1583485217113_0347 --user-class-path
> file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
> {code}
>
>
> After that, there are lots of pyspark.daemon process left.
> eg:
> /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org