You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fengyu Cao (JIRA)" <ji...@apache.org> on 2019/01/04 04:04:00 UTC

[jira] [Comment Edited] (SPARK-26389) temp checkpoint folder at executor should be deleted on graceful shutdown

    [ https://issues.apache.org/jira/browse/SPARK-26389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733815#comment-16733815 ] 

Fengyu Cao edited comment on SPARK-26389 at 1/4/19 4:03 AM:
------------------------------------------------------------

{quote}Temp checkpoint can be used in one-node scenario and deleted only if the query didn't fail.
{quote}
Yes, and there're no logs or error msgs says that we *must* set a non-temp checkpoint if we run a framework non-local

And if we do this(run non-local with temp checkpoint), the checkpoint dir on executor consume lots of space and not be deleted if the query fails, and this checkpoint can't be used to recover as I mentioned above.

I just think that spark either should prohibits users from using temp checkpoints when their frameworks are non-local, or should be responsible for cleaning up this useless checkpoint directory even if the query fails.

 

 


was (Author: camper42):
{quote}Temp checkpoint can be used in one-node scenario and deleted only if the query didn't fail.
{quote}
Yes, and there're no logs or error msgs says that we *must* set a non-temp checkpoint if we run a framework non-local

And if we do this(run non-local with temp checkpoint), the checkpoint dir on executor consume lots of space and not be deleted if the query if fail, and this checkpoint can't be used to recover as I mentioned above.

I just think that spark either should prohibits users from using temp checkpoints when their frameworks are non-local, or should be responsible for cleaning up this useless checkpoint directory even if the query fails.

 

 

> temp checkpoint folder at executor should be deleted on graceful shutdown
> -------------------------------------------------------------------------
>
>                 Key: SPARK-26389
>                 URL: https://issues.apache.org/jira/browse/SPARK-26389
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Fengyu Cao
>            Priority: Major
>
> {{spark-submit --master mesos://<mesos> -conf spark.streaming.stopGracefullyOnShutdown=true <structured streaming framework>}}
> CTRL-C, framework shutdown
> {{18/12/18 10:27:36 ERROR MicroBatchExecution: Query [id = f512e17a-df88-4414-a5cd-a23550cf1e7f, runId = 24d99723-8d61-48c0-beab-af432f7a19d3] terminated with error org.apache.spark.SparkException: Writing job aborted.}}
> {{/tmp/temporary-<uuid> on executor not deleted due to org.apache.spark.SparkException: Writing job aborted., and this temp checkpoint can't used to recovery.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org