You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Imran Rashid <ir...@cloudera.com.INVALID> on 2018/07/26 19:31:14 UTC

offheap memory usage & netty configuration

*I’ve been looking at where untracked memory is getting used in spark,
especially offheap memory, and I’ve discovered some things I’d like to
share with the community.  Most of what I’ve learned has been about the way
spark is using netty -- I’ll go into some more detail about that below. I’m
also planning on opening up a number of jiras.  I don’t think it really
makes sense to put them in an Epic or anything, since they’re mostly
independent, just somewhat related, so I’m going to put a “memory-analysis”
label on some of the new (and existing) jiras.  I thought it would be
useful to share a slightly broader view with the community here.Aside from
memory use by netty, two other high level points: 1. It would be really
nice if spark had an “executor-plugin” api
[https://issues.apache.org/jira/browse/SPARK-24918
<https://issues.apache.org/jira/browse/SPARK-24918>].  Its nice to have
instrumentation which can exist entirely outside of the spark
codebase[https://github.com/squito/spark-memory
<https://github.com/squito/spark-memory>], but with dynamic allocation you
can’t easily get something to execute everytime an executor starts with the
current apis.  (For users with memory issues -- you might be interested in
trying out my tool with a patched version of spark.)2. Metaspace uses about
200 MB on the driver, and its all offheap in java 8.  Its not a ton, but
something I really hadn’t considered before for problems with offheap
memory, and big enough that it can matter.More details about Netty:Netty
maintains its own pools of
memory[https://netty.io/wiki/reference-counted-objects.html
<https://netty.io/wiki/reference-counted-objects.html>] for performance.
By default, spark asks netty to use offheap memory for this (configurable
with “spark.shuffle.io.preferDirectBufs”).  Furthermore, netty ties the
configuration of the pool to the number of IO threads by default, to
minimize the thread contention.While this is meant to save time of
allocating memory, in practice, with Spark’s configuration, this seems to
result in a lot of wasted memory.  First, spark creates multiple
independent pools. Depending on configuration & whether its the executor or
driver, there are different pools for: 1. RPC Client2. RPC Server3.
BlockTransfer Client4. BlockTransfer Server5. ExternalShuffle ClientMemory
use can spike very quickly for an incoming burst of messages.  As the pools
grow in 16 MB chunks, the growth isn’t necessarily related to the amount of
data received.  A burst of tiny messages, processed by 8 io threads, will
lead to a minimum of 128 MB in the respective pool (though the pool is
almost entirely empty).[https://issues.apache.org/jira/browse/SPARK-24920
<https://issues.apache.org/jira/browse/SPARK-24920>]Spark limits each of
these to have no more than 8 io threads, but given 4 different “services”
are active, the total number of io threads is often 32 threads.Even when
Spark configures netty to use offheap memory, we still see the pools using
a significant portion of memory onheap -- it may just be from message
encoder choosing onheap
buffers[https://issues.apache.org/jira/browse/SPARK-24938
<https://issues.apache.org/jira/browse/SPARK-24938>]Netty exposes the
memory use of its pools which you can
*poll*[https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArenaMetric.java
<https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArenaMetric.java>].
However, polling is imprecise -- if you see a largely unused pool, is that
because it was nearly full and then freed right away? Or did it really grow
much more than it ever needed to?  Similarly there are many more aspects of
netty state which are hard to monitor, like: - The “quantile” of each chunk
in the chunk
lists[https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L55-L60
<https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L55-L60>]-
The source of pool usage -- inbound data that is read off the socket?  Or
messages that spark is encoding?- Getting notifications when high water
marks are passed- How many messages are sitting in ChannelOutboundBuffer
waiting to be processed [related:
https://issues.apache.org/jira/browse/SPARK-24801
<https://issues.apache.org/jira/browse/SPARK-24801>]Note that spark can
hold onto the buffers from netty beyond just receiving and decoding data,
the steps we think of as associated w/ the network layer.  For example,
when fetching shuffle blocks, after the bytes have been fetched, we create
a ByteBufInputStream from the bytes and feed them into the shuffle logic;
they are returned to the pool when they are fully read.There are some more
minor details, I won’t go into those here as this is already long enough.
I’ll just open up specific jiras with the  “memory-analysis” label.I’m
filing a bunch of issues, and I think they’re mostly up for grabs unless
they have specific comments.  It is worth noting, though, that for a lot of
them the code changes are very small, the real work is trying the changes
out with actual workloads and reporting on the results, so they might be
hard for folks without the ability to run experiments on a cluster.*
thanks,
Imran