You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Isabelle Phan <nl...@gmail.com> on 2019/10/23 18:43:28 UTC

Delete checkpointed data for a single dataset?

Hello,

In a non streaming application, I am using the checkpoint feature to
truncate the lineage of complex datasets. At the end of the job, the
checkpointed data, which is stored in HDFS, is deleted.
I am looking for a way to delete the unused checkpointed data earlier than
the end of the job. If I know that one dataset won't be used anymore, is
there a way to delete its checkpointed data in the middle of the
application?

Thank you,

Isabelle