You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Sukmanowsky <mi...@gmail.com> on 2016/05/27 14:11:43 UTC

Python memory included YARN-monitored memory?

Hi everyone,

More of a YARN/OS question than a Spark one, but would be good to clarify
this on the docs somewhere once I get an answer.

We use PySpark for all our Spark applications running on EMR. Like many
users, we're accustomed to seeing the occasional ExecutorLostFailure after
YARN kills a container using more memory than it was allocated.

We're beginning to tune spark.yarn.executor.memoryOverhead, but before
messing around with that I wanted to check if YARN is monitoring the memory
usage of both the executor JVM and the spawned pyspark.daemon process or
just the JVM? Inspecting things on one of the YARN nodes would seem to
indicate this isn't the case since the spawned daemon gets a separate
process ID and process group, but I wanted to check to confirm as it could
make a big difference to pyspark users hoping to tune things.

Thanks,
Mike