You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2019/06/26 19:31:00 UTC

[jira] [Commented] (SPARK-28165) SHS does not delete old inprogress files until cleaner.maxAge after SHS start time

    [ https://issues.apache.org/jira/browse/SPARK-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873614#comment-16873614 ] 

Imran Rashid commented on SPARK-28165:
--------------------------------------

btw if anybody wants to investigate this more, here's a simple test case, (though as discussed above, we can't just use the modtime as its not totally trustworthy):

{code}
  test("log cleaner for inprogress files before SHS startup") {
    val firstFileModifiedTime = TimeUnit.SECONDS.toMillis(10)
    val secondFileModifiedTime = TimeUnit.SECONDS.toMillis(100)
    val maxAge = TimeUnit.SECONDS.toMillis(40)
    val clock = new ManualClock(0)

    val log1 = newLogFile("inProgressApp1", None, inProgress = true)
    writeFile(log1, true, None,
      SparkListenerApplicationStart(
        "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1"))
    )
    log1.setLastModified(firstFileModifiedTime)

    val log2 = newLogFile("inProgressApp2", None, inProgress = true)
    writeFile(log2, true, None,
      SparkListenerApplicationStart(
        "inProgressApp2", Some("inProgressApp2"), 23L, "test2", Some("attempt2"))
    )
    log2.setLastModified(secondFileModifiedTime)

    // advance the clock so the first log is expired, but second log is still recent
    clock.setTime(secondFileModifiedTime)
    assert(clock.getTimeMillis() > firstFileModifiedTime + maxAge)

    // start up the SHS
    val provider = new FsHistoryProvider(
      createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock)

    provider.checkForLogs()

    // We should cleanup one log immediately
    updateAndCheck(provider) { list =>
      assert(list.size  === 1)
    }
    assert(!log1.exists())
    assert(log2.exists())
  }
{code}

> SHS does not delete old inprogress files until cleaner.maxAge after SHS start time
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-28165
>                 URL: https://issues.apache.org/jira/browse/SPARK-28165
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.3, 2.4.3
>            Reporter: Imran Rashid
>            Priority: Major
>
> The SHS will not delete inprogress files until {{spark.history.fs.cleaner.maxAge}} time after it has started (7 days by default), regardless of when the last modification to the file was.  This is particularly problematic if the SHS gets restarted regularly, as then you'll end up never deleting old files.
> There might not be much we can do about this -- we can't really trust the modification time of the file, as that isn't always updated reliably.
> We could take the last time of any event from the file, but then we'd have to turn off the optimization of SPARK-6951, to avoid reading the entire file just for the listing.
> *WORKAROUND*: have the SHS save state across restarts to local disk by specifying a path in {{spark.history.store.path}}.  It'll still take 7 days from when you add that config for the cleaning to happen, but then going for the cleaning should happen reliably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org