You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/26 10:45:08 UTC

[GitHub] [flink] pnowojski commented on a change in pull request #13718: [FLINK-18811] Pick another tmpDir if an IOException occurs when creating spill file

pnowojski commented on a change in pull request #13718:
URL: https://github.com/apache/flink/pull/13718#discussion_r511867252



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/SpanningWrapper.java
##########
@@ -286,11 +292,21 @@ private FileChannel createSpillingChannel() throws IOException {
 		// try to find a unique file name for the spilling channel
 		int maxAttempts = 10;
 		for (int attempt = 0; attempt < maxAttempts; attempt++) {
-			String directory = tempDirs[rnd.nextInt(tempDirs.length)];
+			int dirIndex = rnd.nextInt(tempDirs.length);

Review comment:
       maybe instead of random picking a directory per every attempt, just randomly pick a starting directory index and then per each attempt increase it by one. +/- something like that:
   ```
   int initialDirIndex = rnd.nextInt(...);
   for (int attempt = 0; attempt < maxAttempts; attempt++) {
     int dirIndex = (initialDirIndex + attempt) % tempDirs.length;
     ...
   }
   ```
   ?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/SpanningWrapper.java
##########
@@ -286,11 +292,21 @@ private FileChannel createSpillingChannel() throws IOException {
 		// try to find a unique file name for the spilling channel
 		int maxAttempts = 10;
 		for (int attempt = 0; attempt < maxAttempts; attempt++) {
-			String directory = tempDirs[rnd.nextInt(tempDirs.length)];
+			int dirIndex = rnd.nextInt(tempDirs.length);
+			String directory = tempDirs[dirIndex];
 			File file = new File(directory, randomString(rnd) + ".inputchannel");
-			if (file.createNewFile()) {
-				spillFile = new RefCountedFile(file);
-				return new RandomAccessFile(file, "rw").getChannel();
+			try {
+				if (file.createNewFile()) {
+					spillFile = new RefCountedFile(file);
+					return new RandomAccessFile(file, "rw").getChannel();
+				}
+			} catch (IOException e) {
+				// if there is no tempDir left to try
+				if (tempDirs.length <= 1) {
+					throw e;
+				}
+				LOG.warn("Caught an IOException when creating spill file: " + directory + ". Attempt " + attempt, e);
+				tempDirs = (String[]) ArrayUtils.remove(tempDirs, dirIndex);

Review comment:
       If we settle on such trivial approach to the problem (without temporary blacklisting), I wouldn't remove the failed dir from the `tempDir`, but just keep it and re-try next time?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org