You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2019/06/25 16:59:00 UTC

[jira] [Created] (SPARK-28165) SHS does not delete old inprogress files until cleaner.maxAge after SHS start time

Imran Rashid created SPARK-28165:
------------------------------------

             Summary: SHS does not delete old inprogress files until cleaner.maxAge after SHS start time
                 Key: SPARK-28165
                 URL: https://issues.apache.org/jira/browse/SPARK-28165
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.3, 2.3.3
            Reporter: Imran Rashid


The SHS will not delete inprogress files until {{spark.history.fs.cleaner.maxAge}} time after it has started (7 days by default), regardless of when the last modification to the file was.  This is particularly problematic if the SHS gets restarted regularly, as then you'll end up never deleting old files.

There might not be much we can do about this -- we can't really trust the modification time of the file, as that isn't always updated reliably.

We could take the last time of any event from the file, but then we'd have to turn off the optimization of SPARK-6951, to avoid reading the entire file just for the listing.

*WORKAROUND*: have the SHS save state across restarts to local disk by specifying a path in {{spark.history.store.path}}.  It'll still take 7 days from when you add that config for the cleaning to happen, but then going for the cleaning should happen reliably.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org