You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Adam Binford <ad...@gmail.com> on 2022/11/30 16:55:03 UTC

Re: [SPARK STRUCTURED STREAMING] : Rocks DB uses off-heap usage

We started hitting this as well, seeing 90+ GB resident memory on a 25 GB
heap executor. After a lot of manually testing fixes, I finally figured out
the root problem: https://issues.apache.org/jira/browse/SPARK-41339

Starting to work on a PR now to fix.

On Mon, Sep 12, 2022 at 10:46 AM Artemis User <ar...@dtechspace.com>
wrote:

> The off-heap memory isn't subjected to GC.  So the obvious reason is that
> your have too many states to maintain in your streaming app, and the GC
> couldn't keep up, and end up with resources but to die.  Are you using
> continues processing or microbatch in structured streaming?  You may want
> to lower your incoming data rate and/or increase your microbatch size so to
> lower the number of states to be persisted/maintained...
>
> On 9/11/22 10:59 AM, akshit marwah wrote:
>
> Hi Team,
>
> We are trying to shift from HDFS State Manager to Rocks DB State Manager,
> but while doing POC we realised it is using much more off-heap space than
> expected. Because of this, the executors get killed with  : *out of**
> physical memory exception.*
>
> Could you please help in understanding, why is there a massive increase in
> off-heap space, and what can we do about it?
>
> We are using, SPARK 3.2.1 with 1 executor and 1 executor core, to
> understand the memory requirements -
> 1. Rocks DB Run - took 3.5 GB heap and 11.5 GB Res Memory
> 2. Hdfs State Manager - took 5 GB heap and 10 GB Res Memory.
>
> Thanks,
> Akshit
>
>
> Thanks and regards
> - Akshit Marwah
>
>
>

-- 
Adam Binford