You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chaoqin Li (Jira)" <ji...@apache.org> on 2023/02/06 07:55:00 UTC

[jira] [Created] (SPARK-42353) Cleanup orphan sst and log files in RocksDB checkpoint directory

Chaoqin Li created SPARK-42353:
----------------------------------

             Summary: Cleanup orphan sst and log files in RocksDB checkpoint directory
                 Key: SPARK-42353
                 URL: https://issues.apache.org/jira/browse/SPARK-42353
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.2.3
            Reporter: Chaoqin Li


When RocksDB version.zip file get overwritten (e.g. concurrent task execution, task/stage/batch reattempts) or the zip file don't get uploaded successfully, the associated sst and log files don't get garbage collected.([https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala|https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala#L305-L309]) These files consume storage. We can clean up these SST files during periodic state store maintenance. The major concern is that sst files for ongoing version also appear to be "orphan" because they are uploaded before zip file, we have to be careful not to delete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org