You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2017/09/20 17:21:20 UTC

Re: Total memory tracking: request for comments

Thanks. This is an important direction to explore and my apologies for the
late reply.

One thing that is really hard about this is that with different layers of
abstractions, we often use other libraries that might allocate large amount
of memory (e.g. snappy library, Parquet itself), which makes it very
difficult to track. That's where I see how most of the OOMs or crashes
happen. How do you propose solving those?



On Tue, Jun 20, 2017 at 4:15 PM, Jose Soltren <jo...@cloudera.com> wrote:

> https://issues.apache.org/jira/browse/SPARK-21157
>
> Hi - often times, Spark applications are killed for overrunning available
> memory by YARN, Mesos, or the OS. In SPARK-21157, I propose a design for
> grabbing and reporting "total memory" usage for Spark executors - that is,
> memory usage as visible from the OS, including on-heap and off-heap memory
> used by Spark and third party libraries. This builds on many ideas from
> SPARK-9103.
>
> I'd really welcome some review and some feedback of this design proposal.
> I think this could be a helpful feature for Spark users who are trying to
> triage memory usage issues. In the future I'd like to think about reporting
> memory usage from third party libraries like Netty, as was originally
> proposed in SPARK-9103.
>
> Cheers,
> --José
>

Re: Total memory tracking: request for comments

Posted by Vadim Semenov <va...@datadoghq.com>.
I read the design doc in https://issues.apache.org/jira/browse/SPARK-21157

and I described what you essentially proposed


> One thing that is really hard about this is that with different layers of
abstractions, we often use other libraries that might allocate large amount
of memory (e.g. snappy library, Parquet itself), which makes it very
difficult to track. That's where I see how most of the OOMs or crashes
happen. How do you propose solving those?

Since they're getting executed within the JVM process, we don't need to
separately account for them, since we're looking at the whole process
memory stats.

On Wed, Sep 20, 2017 at 1:56 PM, Vadim Semenov <va...@datadoghq.com>
wrote:

> Just going to say what we did at Datadog for counting total memory of each
> executor, and maybe someone will find it useful.
>
> We get the PID of the java process that runs an executor, and then get the
> Resident Set Size memory from the system's `/proc/<pid>/stat` and then send
> that value with the YARN Container ID.
>
> We forked Etsy's `statsd-jvm-profiler`, added the said thing
> https://github.com/DataDog/spark-jvm-profiler/pull/1
> and then add this java agent to executor's JVM options like
>
> `--conf "spark.executor.extraJavaOptions=… -javaagent:spark-jvm-profiler.
> jar=server=localhost,port=8125,profilers=MemoryProfiler"`
>
> and then it goes to any StatsD backend, in this case our datadog-agent.
>
> And then we get metrics in our UI and see the actual total Process memory,
> Heap Total/Used/Avg/Max
>
>
> ​
>
> On Wed, Sep 20, 2017 at 1:21 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Thanks. This is an important direction to explore and my apologies for
>> the late reply.
>>
>> One thing that is really hard about this is that with different layers of
>> abstractions, we often use other libraries that might allocate large amount
>> of memory (e.g. snappy library, Parquet itself), which makes it very
>> difficult to track. That's where I see how most of the OOMs or crashes
>> happen. How do you propose solving those?
>>
>>
>>
>> On Tue, Jun 20, 2017 at 4:15 PM, Jose Soltren <jo...@cloudera.com> wrote:
>>
>>> https://issues.apache.org/jira/browse/SPARK-21157
>>>
>>> Hi - often times, Spark applications are killed for overrunning
>>> available memory by YARN, Mesos, or the OS. In SPARK-21157, I propose a
>>> design for grabbing and reporting "total memory" usage for Spark executors
>>> - that is, memory usage as visible from the OS, including on-heap and
>>> off-heap memory used by Spark and third party libraries. This builds on
>>> many ideas from SPARK-9103.
>>>
>>> I'd really welcome some review and some feedback of this design
>>> proposal. I think this could be a helpful feature for Spark users who are
>>> trying to triage memory usage issues. In the future I'd like to think about
>>> reporting memory usage from third party libraries like Netty, as was
>>> originally proposed in SPARK-9103.
>>>
>>> Cheers,
>>> --José
>>>
>>
>>
>

Re: Total memory tracking: request for comments

Posted by Vadim Semenov <va...@datadoghq.com>.
Just going to say what we did at Datadog for counting total memory of each
executor, and maybe someone will find it useful.

We get the PID of the java process that runs an executor, and then get the
Resident Set Size memory from the system's `/proc/<pid>/stat` and then send
that value with the YARN Container ID.

We forked Etsy's `statsd-jvm-profiler`, added the said thing
https://github.com/DataDog/spark-jvm-profiler/pull/1
and then add this java agent to executor's JVM options like

`--conf "spark.executor.extraJavaOptions=…
-javaagent:spark-jvm-profiler.jar=server=localhost,port=8125,profilers=MemoryProfiler"`

and then it goes to any StatsD backend, in this case our datadog-agent.

And then we get metrics in our UI and see the actual total Process memory,
Heap Total/Used/Avg/Max


​

On Wed, Sep 20, 2017 at 1:21 PM, Reynold Xin <rx...@databricks.com> wrote:

> Thanks. This is an important direction to explore and my apologies for the
> late reply.
>
> One thing that is really hard about this is that with different layers of
> abstractions, we often use other libraries that might allocate large amount
> of memory (e.g. snappy library, Parquet itself), which makes it very
> difficult to track. That's where I see how most of the OOMs or crashes
> happen. How do you propose solving those?
>
>
>
> On Tue, Jun 20, 2017 at 4:15 PM, Jose Soltren <jo...@cloudera.com> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-21157
>>
>> Hi - often times, Spark applications are killed for overrunning available
>> memory by YARN, Mesos, or the OS. In SPARK-21157, I propose a design for
>> grabbing and reporting "total memory" usage for Spark executors - that is,
>> memory usage as visible from the OS, including on-heap and off-heap memory
>> used by Spark and third party libraries. This builds on many ideas from
>> SPARK-9103.
>>
>> I'd really welcome some review and some feedback of this design proposal.
>> I think this could be a helpful feature for Spark users who are trying to
>> triage memory usage issues. In the future I'd like to think about reporting
>> memory usage from third party libraries like Netty, as was originally
>> proposed in SPARK-9103.
>>
>> Cheers,
>> --José
>>
>
>