You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by kedar mhaswade <ke...@gmail.com> on 2018/03/26 23:32:49 UTC

Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

Typically, when one wants to run a Flink job on a Hadoop YARN installation,
one creates a Yarn session (e.g. ./bin/yarn-session.sh -n 4 -qu
test-yarn-queue) and runs intended Flink job(s) (e.g. ./bin/flink run -c
MyFlinkApp -m job-manager-host:job-manager-port <overriding app config
params> myapp.jar) on the Flink cluster whose job manager URL is returned
by the previous command.

My questions are:
- Does yarn-session.sh need conf/flink-conf.yaml to be available in Flink
installation on every container in YARN? If this file is needed, how can
one run different YARN sessions (with potentially very different
configurations) on the same Hadoop YARN installation simultaneously?
- Is it possible to start the YARN session programmatically? If yes, I
believe I should look at classes like YarnClusterClient
<https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/yarn/YarnClusterClient.html>.
Is that right? Is there any other guidance on how to do this
programmatically (e.g. I have a management UI that wants to start/stop YARN
sessions and deploy Flink jobs to it)?

Regards,
Kedar

Re: Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

Posted by Chesnay Schepler <ch...@apache.org>.

Hello,

I think the flink-conf.yaml should only be required on the node on which 
you call yarn-session.sh.

For starting the session cluster programmatically you would have to look 
into the YarnClusterDescriptor (for starting the session cluster) and 
the YarnClusterClient for submitting jobs (but you get a client from the 
cluster descriptor).
Do note however that these are internal API's; they may or may not be 
documented, they may rely on specific behavior of the CLI and there are 
no API stability guarantees.

The YARNSessionFIFOITCase may provide some hints on how to use it.

On 27.03.2018 01:32, kedar mhaswade wrote:
> Typically, when one wants to run a Flink job on a Hadoop YARN 
> installation, one creates a Yarn session (e.g. ./bin/yarn-session.sh 
> -n 4 -qu test-yarn-queue) and runs intended Flink job(s) (e.g. 
> ./bin/flink run -c MyFlinkApp -m job-manager-host:job-manager-port 
> <overriding app config params> myapp.jar) on the Flink cluster whose 
> job manager URL is returned by the previous command.
>
> My questions are:
> - Does yarn-session.sh need conf/flink-conf.yaml to be available in 
> Flink installation on every container in YARN? If this file is needed, 
> how can one run different YARN sessions (with potentially very 
> different configurations) on the same Hadoop YARN installation 
> simultaneously?
> - Is it possible to start the YARN session programmatically? If yes, I 
> believe I should look at classes like YarnClusterClient 
> <https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/yarn/YarnClusterClient.html>. 
> Is that right? Is there any other guidance on how to do this 
> programmatically (e.g. I have a management UI that wants to start/stop 
> YARN sessions and deploy Flink jobs to it)?
>
> Regards,
> Kedar
>