You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chris Hebert <ch...@digitalreasoning.com> on 2017/07/28 21:55:46 UTC

Flink on YARN - tmp directory

Hi,

My jobs create tmp files like so:

java.nio.file.Path tmpFilePath =
java.nio.file.Files.createTempFile("tmpFile", "txt");

They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/
.

The Flink on YARN docs say:

Flink on YARN will overwrite the following configuration parameters
jobmanager.rpc.address (because the JobManager is always allocated at
different machines), taskmanager.tmp.dirs (we are using the tmp directories
given by YARN) and parallelism.default if the number of slots has been
specified.

How would I specify a different tmp directory for a job without modifying
my YARN tmp directories?

I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway,
that failed.

I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations
of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.

I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to .
/bin/yarn-session.sh and ./bin/flink run et cetera, that failed.

Odd observation:
The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/,
yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.

Side note:
My Flink job is a Beam pipeline. I doubt that's relevant, but let me know
if it is.

Thanks,
Chris

Re: Flink on YARN - tmp directory

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Chris,

I think in this case we need to change what is passed as "-Djava.io <http://djava.io/>.tmpdir" to the JVMs that run the TaskManagers. You should be able to achieve this via env.java.opts or more specifically env.java.opts.taskmanager [1]. The directory specified via task taskmanager.tmp.dirs is only used to set the internal Flink tmp directories but doesn't change what Java assumes as the tmp directory. You should be able to change that setting in the flink-conf.yaml or pass it as a "dynamic property" when running via bin/flink (per-job YARN cluster) or when creating the YARN session. For example:

bin/flink ... -Denv.java.opts.taskmanager="-Djava.io <http://djava.io/>.tmpdir=/my/tmp" ...

Best,
Aljoscha

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options
> On 29. Jul 2017, at 00:00, Chris Hebert <ch...@digitalreasoning.com> wrote:
> 
> I should also note that the above steps did get the Flink JobManager and TaskManagers to save their tmp web dashboard files to /my/tmp/ and to show in the Dashboard that the taskmanager.tmp.dirs property had been properly set to /my/tmp/, but the tmp files I wrote in my jobs stubbornly wrote to /tmp/ anyway.
> 
> On Fri, Jul 28, 2017 at 4:55 PM, Chris Hebert <chris.hebert-int@digitalreasoning.com <ma...@digitalreasoning.com>> wrote:
> Hi,
>  
> My jobs create tmp files like so:
> 
> java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile", "txt");
> 
> They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/.
> 
> The Flink on YARN docs say:
> Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address (because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and parallelism.default if the number of slots has been specified.
> How would I specify a different tmp directory for a job without modifying my YARN tmp directories?
> 
> I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway, that failed.
> 
> I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.
> 
> I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to ./bin/yarn-session.sh and ./bin/flink run et cetera, that failed.
> 
> Odd observation:
> The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/, yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.
> 
> Side note:
> My Flink job is a Beam pipeline. I doubt that's relevant, but let me know if it is.
> 
> Thanks,
> Chris
> 


Re: Flink on YARN - tmp directory

Posted by Chris Hebert <ch...@digitalreasoning.com>.
I should also note that the above steps did get the Flink JobManager and
TaskManagers to save their tmp web dashboard files to /my/tmp/ and to show
in the Dashboard that the taskmanager.tmp.dirs property had been properly
set to /my/tmp/, but the tmp files I wrote in my jobs stubbornly wrote to
/tmp/ anyway.

On Fri, Jul 28, 2017 at 4:55 PM, Chris Hebert <
chris.hebert-int@digitalreasoning.com> wrote:

> Hi,
>
> My jobs create tmp files like so:
>
> java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile",
> "txt");
>
> They currently appear in /tmp/, but I want them somewhere else, say
> /my/tmp/.
>
> The Flink on YARN docs say:
>
> Flink on YARN will overwrite the following configuration parameters
> jobmanager.rpc.address (because the JobManager is always allocated at
> different machines), taskmanager.tmp.dirs (we are using the tmp
> directories given by YARN) and parallelism.default if the number of slots
> has been specified.
>
> How would I specify a different tmp directory for a job without modifying
> my YARN tmp directories?
>
> I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway,
> that failed.
>
> I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations
> of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.
>
> I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to .
> /bin/yarn-session.sh and ./bin/flink run et cetera, that failed.
>
> Odd observation:
> The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/,
> yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.
>
> Side note:
> My Flink job is a Beam pipeline. I doubt that's relevant, but let me know
> if it is.
>
> Thanks,
> Chris
>