You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Guy Harmach <Gu...@Amdocs.com> on 2017/07/13 15:10:53 UTC

How to send local files to a flink job on YARN

Hi,

I'm running a flink job on YARN. I'd like to pass yaml configuration files to the job.
I tried to use the flink cli -yarnship flag to point to a directory containing the file, but wasn't able to get it in the job.
Can someone give an example of how to send local files and how to read them in the job?

Thanks, Guy

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

RE: How to send local files to a flink job on YARN

Posted by Guy Harmach <Gu...@Amdocs.com>.
Hi,

Just to clarify my need, I want to send the file from local file system to the job entry point, read it in the main method, and according its content to build my sources, operations and sinks.
I assumed by the cli usage description for the yarnship flag that it is the equivalent to Spark’s  –files flag that is used to pass local files to the driver.
Any solution other than manually copying/deleting  the file to HDFS?

From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Thursday, July 13, 2017 6:36 PM
To: Guy Harmach <Gu...@Amdocs.com>
Cc: user@flink.apache.org
Subject: Re: How to send local files to a flink job on YARN

That does not sound like a good idea to put a configuration file on every node.

What about Zookeeper?

On 13. Jul 2017, at 17:10, Guy Harmach <Gu...@Amdocs.com>> wrote:
Hi,

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.
I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.
Can someone give an example of how to send local files and how to read them in the job?

Thanks, Guy

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: How to send local files to a flink job on YARN

Posted by Jörn Franke <jo...@gmail.com>.
That does not sound like a good idea to put a configuration file on every node.

What about Zookeeper?

> On 13. Jul 2017, at 17:10, Guy Harmach <Gu...@Amdocs.com> wrote:
> 
> Hi,
>  
> I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.
> I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.
> Can someone give an example of how to send local files and how to read them in the job?
>  
> Thanks, Guy
>  
> This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer

Re: How to send local files to a flink job on YARN

Posted by Aljoscha Krettek <al...@apache.org>.
There’s a bit of a misconception here: in Flink there is no “driver” as there is in spark and the entry point of your program (“main()”) is not executed on the cluster but in the “client”. The main method is only responsible for constructing a program graph, this is then shipped to the cluster and the client (or the “main()”) method can shut down at this point. In your concrete case, this means that the main() method is not executed in the YARN context, i.e. it does not have the files that you specified with the “—yarnship” command.

Regarding “—yarnship” in general, I have descended into the depths of the Flink YARN support and this is how it works:
FlinkYarnSessionCli is the piece of code that acts as entry point when specifying “-m yarn-cluster” at the command line. This is the place where the options are defined: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L138-L138. The options are not hardcoded but have a dynamic prefix, normally the short prefix is “y” and the long prefix is “yarn”. In there you see

shipPath = new Option(shortPrefix + "t", longPrefix + "ship", true, "Ship files in the specified directory (t for transfer)”);

This translates to having the -yt and —yarnship parameters.

As to how FlinkYarnSessionCli is used when specifying “-m yarn-cluster”, this happens here: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136 <https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136>. Essentially, a “CustomCommandLine” subclass is responsible for handling the user invocation and the subclasses can announce that they would like to handle the user command line based on certain settings. For example, FlinkYarnSessionCli will announce that it can handle a command line when the “-m yarn-cluster” option is present: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493 <https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493>. The CliFrontend will loop though the list of registered CustomCommandLine instances and pick the first one that announces that it would like to handle a given invocation: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174 <https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174>

This is very convoluted and I hope my explications somehow help.

Best,
Aljoscha

> On 13. Jul 2017, at 18:02, Ted Yu <yu...@gmail.com> wrote:
> 
> I went back to commit 6e38eb8:
> [FLINK-1436] [docs] update command line documentation
> 
> A search in the repo for "yarnship" ended up with no hit in the code (same with commit bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git repo).
> 
> Wondering whether it is supported.
> 
> On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <Gu...@amdocs.com> wrote:
> Hi,
> 
>  
> 
> I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.
> 
> I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.
> 
> Can someone give an example of how to send local files and how to read them in the job?
> 
>  
> 
> Thanks, Guy
> 
>  
> 
> This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer
> 


Re: How to send local files to a flink job on YARN

Posted by Ted Yu <yu...@gmail.com>.
I went back to commit 6e38eb8:
[FLINK-1436] [docs] update command line documentation

A search in the repo for "yarnship" ended up with no hit in the code (same
with commit bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git
repo).

Wondering whether it is supported.

On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <Gu...@amdocs.com> wrote:

> Hi,
>
>
>
> I’m running a flink job on YARN. I’d like to pass yaml configuration files
> to the job.
>
> I tried to use the flink cli –yarnship flag to point to a directory
> containing the file, but wasn’t able to get it in the job.
>
> Can someone give an example of how to send local files and how to read
> them in the job?
>
>
>
> Thanks, Guy
>
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer
>