You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viper-kun <gi...@git.apache.org> on 2014/09/15 07:54:14 UTC

[GitHub] spark pull request: cycle of deleting history log

GitHub user viper-kun opened a pull request:

    https://github.com/apache/spark/pull/2391

    cycle of deleting history log

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viper-kun/spark xk2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2391
    
----
commit 79f492811dc939d1a50e6d8279e20042df4f95fe
Author: xukun 00228947 <xu...@huawei.com>
Date:   2014-09-13T08:50:18Z

    cycle of deleting history log

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: cycle of deleting history log

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-55688517
  
    @srowen If we run spark application frequently, it will write many spark event log into spark.eventLog.dir. After a long time later, there will be many spark event log that we do not concern in the spark.eventLog.dir. So we need delete some logs that exceeding the time limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: cycle of deleting history log

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-55571126
  
    Can you explain this patch? What problem does it solve and why? There is no JIRA here either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816209
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    +        case dir: FileStatus =>
    +          if(System.currentTimeMillis() - getModificationTime(dir) > maxAge) {
    +            fs.delete(dir.getPath, true)
    +          }
    +        }
    --- End diff --
    
    nit: this line, and the 2 following it, should be indented 2 less spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816145
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -38,6 +38,9 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
       private val UPDATE_INTERVAL_MS = conf.getInt("spark.history.fs.updateInterval",
         conf.getInt("spark.history.updateInterval", 10)) * 1000
     
    +  // Interval between each clean for event logs
    +  private val CLEAN_INTERVAL_MS = conf.getLong("spark.history.fs.cleanInterval", 86400) * 1000
    --- End diff --
    
    What is 86400? Could you use `Duration` instead so that it's clearer what the default value means?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56110631
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20556/consoleFull) for   PR 2391 at commit [`79f4928`](https://github.com/apache/spark/commit/79f492811dc939d1a50e6d8279e20042df4f95fe).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816047
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -83,6 +89,25 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   * A background thread that periodically clean event log on disk.
    --- End diff --
    
    nit: cleans, logs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17762145
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         checkForLogs()
         logCheckingThread.setDaemon(true)
         logCheckingThread.start()
    +
    +    if(conf.getBoolean("spark.history.cleaner.enable", false)) {
    +      cleanLogs()
    +      logCleaningThread.setDaemon(true)
    +      logCleaningThread.start()
    +    }
    --- End diff --
    
    Pretty sure you don't want to clean logs on initialization. Otherwise your history server will always display nothing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816156
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    --- End diff --
    
    Similarly, could you use a `Duration` instead of 604800?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816104
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         checkForLogs()
         logCheckingThread.setDaemon(true)
         logCheckingThread.start()
    +
    +    if(conf.getBoolean("spark.history.cleaner.enable", false)) {
    --- End diff --
    
    nit: `if (`
    
    Also, conf option should start with `spark.history.fs.`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816189
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    --- End diff --
    
    nit: `foreach {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816266
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    +        case dir: FileStatus =>
    +          if(System.currentTimeMillis() - getModificationTime(dir) > maxAge) {
    +            fs.delete(dir.getPath, true)
    --- End diff --
    
    I wonder if it would be better to try each delete separately; if someone writes a directory with wrong permissions to the log dir, they may cause the cleaner thread to never clean up anything.
    
    (That would be better if there were a more descriptive exception for those errors, but I think all `FileSystem` will throw are `IOException`s and `InterruptedException`s.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-57004970
  
    @viper-kun Since there is now a new PR, would you mind closing this one? It's confusing to have two PRs that do the same thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: cycle of deleting history log

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-55554965
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: cycle of deleting history log

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-55814250
  
    Hey @viper-kun can you file a JIRA and put it in the title of this PR? See how other Spark PRs are formatted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56261135
  
    @vanzin , @andrewor14 .Thanks for your opinions. Because the source branch had been deleted by me, i can change it in this commit. i submit another commit[#2471] and change the source according your opinions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56116643
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20556/consoleFull) for   PR 2391 at commit [`79f4928`](https://github.com/apache/spark/commit/79f492811dc939d1a50e6d8279e20042df4f95fe).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17818744
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    +        case dir: FileStatus =>
    +          if(System.currentTimeMillis() - getModificationTime(dir) > maxAge) {
    +            fs.delete(dir.getPath, true)
    --- End diff --
    
    FileSystem.listStatus() will throw FileNotFoundException and IOException; FileSystem.delete will throw IOException. And I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816167
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    +        case dir: FileStatus =>
    +          if(System.currentTimeMillis() - getModificationTime(dir) > maxAge) {
    --- End diff --
    
    nit: `if (`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17816237
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         }
       }
     
    +  /**
    +   *  Deleting apps if setting cleaner.
    +   */
    +  private def cleanLogs() = {
    +    lastLogCleanTimeMs = System.currentTimeMillis()
    +    logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
    +    try {
    +      val logStatus = fs.listStatus(new Path(resolvedLogDir))
    +      val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +      val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
    +      logDirs.foreach{
    --- End diff --
    
    Actually, usually we use this style for this kind of code:
    
        logDirs.foreach { dir =>
          blah
        }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56109954
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56251530
  
    Looks ok to me, mostly a few nits.
    
    Could you also update `docs/monitoring.md` and add the new conf parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56129613
  
    @andrewor14  i have checked Hadoop's JobHistoryServer. it is JobHistoryServer's responsibility to delete the application logs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2391#issuecomment-56118558
  
    @viper-kun I'm not sure if it's the HistoryServer's responsibility to delete the application logs. The HistoryServer is intended to be a serving daemon; it should not modify application output, since it did not create any output in the first place. If I'm not wrong, Hadoop's JobHistoryServer has the same semantics (i.e. it doesn't randomly delete your logs). Perhaps @srowen can confirm this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2391#discussion_r17766911
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
         checkForLogs()
         logCheckingThread.setDaemon(true)
         logCheckingThread.start()
    +
    +    if(conf.getBoolean("spark.history.cleaner.enable", false)) {
    +      cleanLogs()
    +      logCleaningThread.setDaemon(true)
    +      logCleaningThread.start()
    +    }
    --- End diff --
    
    yes,you are right. it is no need to clean it on initialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org