You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Angel Barragán (Jira)" <ji...@apache.org> on 2020/03/11 11:29:00 UTC

[jira] [Created] (FLINK-16544) Flink FileSystem for web.uploadDir

Angel Barragán created FLINK-16544:
--------------------------------------

             Summary: Flink FileSystem for web.uploadDir
                 Key: FLINK-16544
                 URL: https://issues.apache.org/jira/browse/FLINK-16544
             Project: Flink
          Issue Type: Improvement
          Components: API / Core
    Affects Versions: 1.10.0
            Reporter: Angel Barragán


Currently the configuration properties "web.upload.dir" and "web.upload.dir" only supports paths on the local filesystem. When we deploy Flink under another cluster environment like yarn, it is more useful to be able to configure those directories to be on HDFS, so the size and maintenance tasks are easier, than trying to find out on which node yarn has launched the Jobmanager task, and manage the upload directory there.

In my concrete case, I found this management (let's say disadvantage) creating an AWS EMR cluster with Flink, where the default configuration creates this directory under /tmp on the local filesystem of the CORE node where the JobManager is deployed by Yarn. We found that EMR cluster is also configured to fully empty /tmp on a month basis, removing the upload directory for Flink, and in that case makigng Flink to fail when you try to submit a new Job. We had to recreate the directory manually.

The first solution I tried is to change the above configuration properties to use hdfs like we did with configuration property "state.checkpoints.dir", and we found it doesn't work on yarn environment. So I checked Flink code to see how this configuration is being used and found it is the local file system.

I think, that this solution would be an improvement on the management for Flink when running on another Cluster environment where we can use a shared storage like HDFS or S3.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)