You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/10 11:43:21 UTC

[GitHub] [spark] pan3793 opened a new pull request, #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

pan3793 opened a new pull request, #36832:
URL: https://github.com/apache/spark/pull/36832

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Suppress error log for in-progress event log file not found.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   We see lots of the following errors in SHS log on a busy cluster. It's actually not an error, just because the application was completed during SHS processing the event log.
   
   ```
   java.io.FileNotFoundException: File does not exist: /spark2-history/application_1651280650063_4556105_1.lz4.inprogress
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
           at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:170)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1860)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:697)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
   
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
           at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
           at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
           at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
           at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:854)
           at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:841)
           at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:830)
           at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1069)
           at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:303)
           at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:299)
           at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
           at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:311)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:914)
           at org.apache.spark.deploy.history.EventLogFileReader$.openEventLog(EventLogFileReaders.scala:133)
           at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$2(FsHistoryProvider.scala:1131)
           at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2625)
           at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1131)
           at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1129)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1129)
           at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:778)
           at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:715)
           at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$15(FsHistoryProvider.scala:581)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /spark2-history/application_1651280650063_4556105_1.lz4.inprogress
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
           at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:170)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1860)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:697)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
   
           at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1507)
           at org.apache.hadoop.ipc.Client.call(Client.java:1453)
           at org.apache.hadoop.ipc.Client.call(Client.java:1363)
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
           at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259)
           at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
           at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
           at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
           at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:852)
           ... 23 more
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Manually test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894723935


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logWarning(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          s"completed during processing.")

Review Comment:
   nit. `s"` -> `"`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895397461


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -715,7 +715,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
     }
   }
 
-  private def mergeApplicationListing(
+  private[history] def mergeApplicationListing(

Review Comment:
   Instead of changing visibility, could you use `org.scalatest.PrivateMethodTester` and `PrivateMethod` in the test suite?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894539333


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+        if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>

Review Comment:
   Thanks, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894801616


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   Yes, simply (1).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #36832: [SPARK-39439][CORE] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #36832: [SPARK-39439][CORE] Check final file if in-progress event log file does not exist
URL: https://github.com/apache/spark/pull/36832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1152649051

   @dongjoon-hyun Thanks for pinging me! The suggested approach looks good to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894750316


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   Just an idea. Please let me know the difficulty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895303204


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        val finalFileName = reader.rootPath.getName.stripSuffix(EventLogFileWriter.IN_PROGRESS)
+        if (fs.exists(new Path(reader.rootPath.getParent, finalFileName))) {
+          // Do nothing, the application completed during processing, the final event log file
+          // will be processed by next around.
+        } else {
+          logError(s"File does not exist: ${reader.rootPath}")

Review Comment:
   Thanks, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
pan3793 commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1152571604

   Thanks @dongjoon-hyun, changed to `logInfo` as suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894755308


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   The idea sounds good. If the final file exists. We can 1) do nothing and let it be processed by the next round, or 2) recover the process. I think you mean option 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1152668105

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1153354775

   Thank you for your update, @pan3793 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894749928


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   If Spark checks the final location, we can remove this warning.
   > ... The application may be completed during processing.
   
   And, if Spark cannot find the log file at the final locations, it could be removed by the users instead of `completed`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894728167


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logWarning(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +

Review Comment:
   The change also makes it print one line log instead of the whole stack trace



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1152559037

   cc @huaxingao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894755308


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   The idea sounds good. If the final file exists, we can 1) do nothing and let it be processed by the next round, or 2) recover the process. I think you mean option 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
huaxingao commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894817992


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logInfo(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +
+          "completed during processing.")

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895261688


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        val finalFileName = reader.rootPath.getName.stripSuffix(EventLogFileWriter.IN_PROGRESS)
+        if (fs.exists(new Path(reader.rootPath.getParent, finalFileName))) {
+          // Do nothing, the application completed during processing, the final event log file
+          // will be processed by next around.
+        } else {
+          logError(s"File does not exist: ${reader.rootPath}")

Review Comment:
   Could you revise this message to include both `IN_PROGRESS` and the final location?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895388483


##########
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala:
##########
@@ -221,6 +222,48 @@ abstract class FsHistoryProviderSuite extends SparkFunSuite with Matchers with L
     }
   }
 
+  test("SPARK-39439: Check final file if in-progress event log file does not exist") {
+    withTempDir { dir =>
+      val conf = createTestConf()
+      conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+      conf.set(EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN, 1)
+      conf.set(EVENT_LOG_COMPACTION_SCORE_THRESHOLD, 0.0d)
+      val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+      val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+      val provider = new FsHistoryProvider(conf)
+
+      val inProgressFile = newLogFile("app1", None, inProgress = true)
+      val logAppender1 = new LogAppender("in-progress and final event log files does not exist")
+      withLogAppender(logAppender1, level = Some(Level.WARN)) {
+        provider.mergeApplicationListing(
+          EventLogFileReader(fs, new Path(inProgressFile.toURI), None),
+          System.currentTimeMillis,
+          true
+        )
+      }
+      val logs1 = logAppender1.loggingEvents.map(_.getMessage.getFormattedMessage)
+        .filter(_.contains(s"In-progress event log file does not exist: "))

Review Comment:
   nit. `s"` -> `"`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895699684


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -715,7 +715,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
     }
   }
 
-  private def mergeApplicationListing(
+  private[history] def mergeApplicationListing(

Review Comment:
   Yea, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894526608


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+        if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>

Review Comment:
   Two more spaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Suppress error log for in-progress event log file not found

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r894723434


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,10 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        logWarning(s"In-progress file does not exist: ${reader.rootPath}. The application may be " +

Review Comment:
   The PR title `Suppress error log for in-progress event log file not found` is misleading.
   
   The intention looks like switching log level from ERROR to WARN, doesn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895262820


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        val finalFileName = reader.rootPath.getName.stripSuffix(EventLogFileWriter.IN_PROGRESS)
+        if (fs.exists(new Path(reader.rootPath.getParent, finalFileName))) {
+          // Do nothing, the application completed during processing, the final event log file
+          // will be processed by next around.
+        } else {
+          logError(s"File does not exist: ${reader.rootPath}")

Review Comment:
   In addition, this is not a critical error in SHS operation. `logWarning` could be enough in this case. Sorry for asking the log level switching back and forth.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895262820


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         listing.synchronized {
           listing.delete(classOf[LogInfo], rootPath.toString)
         }
+      case _: FileNotFoundException
+          if reader.rootPath.getName.endsWith(EventLogFileWriter.IN_PROGRESS) =>
+        val finalFileName = reader.rootPath.getName.stripSuffix(EventLogFileWriter.IN_PROGRESS)
+        if (fs.exists(new Path(reader.rootPath.getParent, finalFileName))) {
+          // Do nothing, the application completed during processing, the final event log file
+          // will be processed by next around.
+        } else {
+          logError(s"File does not exist: ${reader.rootPath}")

Review Comment:
   In addition, this is not a critical error in SHS. `logWarning` could be enough in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36832: [SPARK-39439][CORE] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36832:
URL: https://github.com/apache/spark/pull/36832#issuecomment-1154339726

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36832:
URL: https://github.com/apache/spark/pull/36832#discussion_r895389373


##########
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala:
##########
@@ -221,6 +222,48 @@ abstract class FsHistoryProviderSuite extends SparkFunSuite with Matchers with L
     }
   }
 
+  test("SPARK-39439: Check final file if in-progress event log file does not exist") {
+    withTempDir { dir =>
+      val conf = createTestConf()
+      conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath)
+      conf.set(EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN, 1)
+      conf.set(EVENT_LOG_COMPACTION_SCORE_THRESHOLD, 0.0d)
+      val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
+      val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
+      val provider = new FsHistoryProvider(conf)
+
+      val inProgressFile = newLogFile("app1", None, inProgress = true)
+      val logAppender1 = new LogAppender("in-progress and final event log files does not exist")
+      withLogAppender(logAppender1, level = Some(Level.WARN)) {
+        provider.mergeApplicationListing(
+          EventLogFileReader(fs, new Path(inProgressFile.toURI), None),
+          System.currentTimeMillis,
+          true
+        )
+      }
+      val logs1 = logAppender1.loggingEvents.map(_.getMessage.getFormattedMessage)
+        .filter(_.contains(s"In-progress event log file does not exist: "))
+      assert(logs1.size === 1)
+
+      writeFile(inProgressFile, None,
+        SparkListenerApplicationStart("app1", Some("app1"), 1L, "test", None),
+        SparkListenerApplicationEnd(2L))
+      val finalFile = newLogFile("app1", None, inProgress = false)
+      inProgressFile.renameTo(finalFile)
+      val logAppender2 = new LogAppender("in-progress event log file has been renamed to final")
+      withLogAppender(logAppender2, level = Some(Level.WARN)) {
+        provider.mergeApplicationListing(
+          EventLogFileReader(fs, new Path(inProgressFile.toURI), None),
+          System.currentTimeMillis,
+          true
+        )
+      }
+      val logs2 = logAppender2.loggingEvents.map(_.getMessage.getFormattedMessage)
+        .filter(_.contains(s"In-progress event log file does not exist: "))

Review Comment:
   nit. `s"` -> `"`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org