You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/08/05 18:01:13 UTC
[spark] branch branch-3.0 updated: [SPARK-32529][CORE] Fix
Historyserver log scan aborted by application status change
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new ab5034f [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
ab5034f is described below
commit ab5034fd181406037dd3273dc8b3ef3af8e0c63a
Author: Yan Xiaole <xi...@gmail.com>
AuthorDate: Wed Aug 5 10:57:11 2020 -0700
[SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
# What changes were proposed in this pull request?
This PR adds a `FileNotFoundException` try catch block while adding a new entry to history server application listing to skip the non-existing path.
### Why are the changes needed?
If there are a large number (>100k) of applications log dir, listing the log dir will take a few seconds. After getting the path list some applications might have finished already, and the filename will change from `foo.inprogress` to `foo`.
It leads to a problem when adding an entry to the listing, querying file status like `fileSizeForLastIndex` will throw out a `FileNotFoundException` exception if the application was finished. And the exception will abort current loop, in a busy cluster, it will make history server couldn't list and load any application log.
```
20/08/03 15:17:23 ERROR FsHistoryProvider: Exception in checking for event log updates
java.io.FileNotFoundException: File does not exist: hdfs://xx/logs/spark/application_11111111111111.lz4.inprogress
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1527)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1520)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1520)
at org.apache.spark.deploy.history.SingleFileEventLogFileReader.status$lzycompute(EventLogFileReaders.scala:170)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
1. setup another script keeps changing the filename of applications under history log dir
2. launch the history server
3. check whether the `File does not exist` error log was gone.
Closes #29350 from yanxiaole/SPARK-32529.
Authored-by: Yan Xiaole <xi...@gmail.com>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit c1d17df826541580162c9db8ebfbc408ec0c9922)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
.../apache/spark/deploy/history/FsHistoryProvider.scala | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
index 99d3ece..c262152 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@@ -519,10 +519,17 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
// If the file is currently not being tracked by the SHS, add an entry for it and try
// to parse it. This will allow the cleaner code to detect the file as stale later on
// if it was not possible to parse it.
- listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, LogType.EventLogs,
- None, None, reader.fileSizeForLastIndex, reader.lastIndex, None,
- reader.completed))
- reader.fileSizeForLastIndex > 0
+ try {
+ listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime,
+ LogType.EventLogs, None, None, reader.fileSizeForLastIndex, reader.lastIndex,
+ None, reader.completed))
+ reader.fileSizeForLastIndex > 0
+ } catch {
+ case _: FileNotFoundException => false
+ }
+
+ case _: FileNotFoundException =>
+ false
}
}
.sortWith { case (entry1, entry2) =>
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org