You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kai Chen (Jira)" <ji...@apache.org> on 2020/08/05 03:55:00 UTC
[jira] [Updated] (FLINK-18811) if a disk is damaged, taskmanager
should choose another disk for temp dir , rather than throw an IOException,
which causes flink job restart over and over again
[ https://issues.apache.org/jira/browse/FLINK-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kai Chen updated FLINK-18811:
-----------------------------
Component/s: (was: Table SQL / Runtime)
Runtime / Network
> if a disk is damaged, taskmanager should choose another disk for temp dir , rather than throw an IOException, which causes flink job restart over and over again
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-18811
> URL: https://issues.apache.org/jira/browse/FLINK-18811
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Environment: flink-1.10
> Reporter: Kai Chen
> Priority: Major
> Attachments: flink_disk_error.png
>
>
> I met this Exception when a disk was damaged:
> !flink_disk_error.png!
> I think, if a disk is damaged, taskmanager should choose another disk for temp dir , rather than throw an IOException, which causes flink job restart over and over again.
> If we could just change “SpillingAdaptiveSpanningRecordDeserializer" like this:
> {code:java}
> // SpillingAdaptiveSpanningRecordDeserializer.java
> private FileChannel createSpillingChannel() throws IOException {
> if (spillFile != null) {
> throw new IllegalStateException("Spilling file already exists.");
> }
> // try to find a unique file name for the spilling channel
> int maxAttempts = 10;
> String[] tempDirs = this.tempDirs;
> for (int attempt = 0; attempt < maxAttempts; attempt++) {
> int dirIndex = rnd.nextInt(tempDirs.length);
> String directory = tempDirs[dirIndex];
> spillFile = new File(directory, randomString(rnd) + ".inputchannel");
> try {
> if (spillFile.createNewFile()) {
> return new RandomAccessFile(spillFile, "rw").getChannel();
> }
> } catch (IOException e) {
> // if there is no tempDir left to try
> if(tempDirs.length <= 1) {
> throw e;
> }
> LOG.warn("Caught an IOException when creating spill file: " + directory + ". Attempt " + attempt, e);
> tempDirs = (String[])ArrayUtils.remove(tempDirs,dirIndex);
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)