You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/08/05 18:01:13 UTC
[spark] branch branch-3.0 updated: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new ab5034f  [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
ab5034f is described below

commit ab5034fd181406037dd3273dc8b3ef3af8e0c63a
Author: Yan Xiaole <xi...@gmail.com>
AuthorDate: Wed Aug 5 10:57:11 2020 -0700

    [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
    
    # What changes were proposed in this pull request?
    This PR adds a `FileNotFoundException` try catch block while adding a new entry to history server application listing to skip the non-existing path.
    
    ### Why are the changes needed?
    If there are a large number (>100k) of applications log dir, listing the log dir will take a few seconds. After getting the path list some applications might have finished already, and the filename will change from `foo.inprogress` to `foo`.
    
    It leads to a problem when adding an entry to the listing, querying file status like `fileSizeForLastIndex` will throw out a `FileNotFoundException` exception if the application was finished. And the exception will abort current loop, in a busy cluster, it will make history server couldn't list and load any application log.
    
    ```
    20/08/03 15:17:23 ERROR FsHistoryProvider: Exception in checking for event log updates
     java.io.FileNotFoundException: File does not exist: hdfs://xx/logs/spark/application_11111111111111.lz4.inprogress
     at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1527)
     at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1520)
     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1520)
     at org.apache.spark.deploy.history.SingleFileEventLogFileReader.status$lzycompute(EventLogFileReaders.scala:170)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    1. setup another script keeps changing the filename of applications under history log dir
    2. launch the history server
    3. check whether the `File does not exist` error log was gone.
    
    Closes #29350 from yanxiaole/SPARK-32529.
    
    Authored-by: Yan Xiaole <xi...@gmail.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit c1d17df826541580162c9db8ebfbc408ec0c9922)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../apache/spark/deploy/history/FsHistoryProvider.scala   | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index 99d3ece..c262152 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -519,10 +519,17 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
               // If the file is currently not being tracked by the SHS, add an entry for it and try
               // to parse it. This will allow the cleaner code to detect the file as stale later on
               // if it was not possible to parse it.
-              listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, LogType.EventLogs,
-                None, None, reader.fileSizeForLastIndex, reader.lastIndex, None,
-                reader.completed))
-              reader.fileSizeForLastIndex > 0
+              try {
+                listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime,
+                  LogType.EventLogs, None, None, reader.fileSizeForLastIndex, reader.lastIndex,
+                  None, reader.completed))
+                reader.fileSizeForLastIndex > 0
+              } catch {
+                case _: FileNotFoundException => false
+              }
+
+            case _: FileNotFoundException =>
+              false
           }
         }
         .sortWith { case (entry1, entry2) =>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org