You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2019/07/09 06:49:57 UTC
[spark] branch master updated: [SPARK-28302][CORE] Make sure to generate unique output file for SparkLauncher on Windows

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 925f620  [SPARK-28302][CORE] Make sure to generate unique output file for SparkLauncher on Windows
925f620 is described below

commit 925f620570a022ff8229bfde076e7dde6bf242df
Author: wuyi <ng...@163.com>
AuthorDate: Tue Jul 9 15:49:31 2019 +0900

    [SPARK-28302][CORE] Make sure to generate unique output file for SparkLauncher on Windows
    
    ## What changes were proposed in this pull request?
    
    When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by  this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala).
    
    After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it  instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.
    
    We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.
    
    ## How was this patch tested?
    
    Tested manually on Windows.
    
    Closes #25076 from Ngone51/SPARK-28302.
    
    Authored-by: wuyi <ng...@163.com>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 bin/spark-class2.cmd | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/bin/spark-class2.cmd b/bin/spark-class2.cmd
index 5da7d7a..34d04c9 100644
--- a/bin/spark-class2.cmd
+++ b/bin/spark-class2.cmd
@@ -63,7 +63,12 @@ if not "x%JAVA_HOME%"=="x" (
 
 rem The launcher library prints the command to be executed in a single line suitable for being
 rem executed by the batch interpreter. So read all the output of the launcher into a variable.
+:gen
 set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
+rem SPARK-28302: %RANDOM% would return the same number if we call it instantly after last call,
+rem so we should make it sure to generate unique file to avoid process collision of writing into
+rem the same file concurrently.
+if exist %LAUNCHER_OUTPUT% goto :gen
 "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
 for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
   set SPARK_CMD=%%i


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org