You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/02/06 08:16:00 UTC

[jira] [Commented] (SPARK-42353) Cleanup orphan sst and log files in RocksDB checkpoint directory

    [ https://issues.apache.org/jira/browse/SPARK-42353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684501#comment-17684501 ] 

Apache Spark commented on SPARK-42353:
--------------------------------------

User 'chaoqin-li1123' has created a pull request for this issue:
https://github.com/apache/spark/pull/39897

> Cleanup orphan sst and log files in RocksDB checkpoint directory
> ----------------------------------------------------------------
>
>                 Key: SPARK-42353
>                 URL: https://issues.apache.org/jira/browse/SPARK-42353
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.2.3
>            Reporter: Chaoqin Li
>            Priority: Major
>
> When RocksDB version.zip file get overwritten (e.g. concurrent task execution, task/stage/batch reattempts) or the zip file don't get uploaded successfully, the associated sst and log files don't get garbage collected.([https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala|https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala#L305-L309]) These files consume storage. We can clean up these SST files during periodic state store maintenance. The major concern is that sst files for ongoing version also appear to be "orphan" because they are uploaded before zip file, we have to be careful not to delete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org