You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chaoqin Li (Jira)" <ji...@apache.org> on 2023/02/06 07:55:00 UTC
[jira] [Created] (SPARK-42353) Cleanup orphan sst and log files in RocksDB checkpoint directory
Chaoqin Li created SPARK-42353:
----------------------------------
Summary: Cleanup orphan sst and log files in RocksDB checkpoint directory
Key: SPARK-42353
URL: https://issues.apache.org/jira/browse/SPARK-42353
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.2.3
Reporter: Chaoqin Li
When RocksDB version.zip file get overwritten (e.g. concurrent task execution, task/stage/batch reattempts) or the zip file don't get uploaded successfully, the associated sst and log files don't get garbage collected.([https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala|https://github.com/databricks/runtime/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala#L305-L309]) These files consume storage. We can clean up these SST files during periodic state store maintenance. The major concern is that sst files for ongoing version also appear to be "orphan" because they are uploaded before zip file, we have to be careful not to delete them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org