You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2022/12/01 06:51:00 UTC

[jira] [Resolved] (SPARK-41339) RocksDB state store WriteBatch doesn't clean up native memory

     [ https://issues.apache.org/jira/browse/SPARK-41339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim resolved SPARK-41339.
----------------------------------
    Fix Version/s: 3.3.2
                   3.4.0
       Resolution: Fixed

Issue resolved by pull request 38853
[https://github.com/apache/spark/pull/38853]

> RocksDB state store WriteBatch doesn't clean up native memory
> -------------------------------------------------------------
>
>                 Key: SPARK-41339
>                 URL: https://issues.apache.org/jira/browse/SPARK-41339
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Structured Streaming
>    Affects Versions: 3.3.1
>            Reporter: Adam Binford
>            Assignee: Adam Binford
>            Priority: Major
>             Fix For: 3.3.2, 3.4.0
>
>
> The RocksDB state store uses a WriteBatch to hold updates that get written in a single transaction to commit. Somewhat indirectly abort is called after a successful task which calls writeBatch.clear(), but the data for a writeBatch is stored in a std::string in the native code. Not sure why it's stored as a string, but it is. [rocksdb/write_batch.h at main · facebook/rocksdb · GitHub|https://github.com/facebook/rocksdb/blob/main/include/rocksdb/write_batch.h#L491]
> writeBatch.clear simply calls rep_.clear() and rep._resize() ([rocksdb/write_batch.cc at main · facebook/rocksdb · GitHub|https://github.com/facebook/rocksdb/blob/main/db/write_batch.cc#L246-L247]), neither of which actually releases the memory built up by a std::string instance. The only way to actually release this memory is to delete the WriteBatch object itself.
> Currently, all memory taken by all write batches will remain until the RocksDB state store instance is closed, which never happens during the normal course of operation as all partitions remain loaded on an executor after a task completes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org