You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Foster, Craig" <fo...@amazon.com> on 2016/08/25 15:02:19 UTC

Flink long-running YARN configuration

I'm trying to understand Flink YARN configuration. The flink-conf.yaml file is supposedly the way to configure Flink, except when you launch Flink using YARN since that's determined for the AM. The following is contradictory or not completely clear:


"The system will use the configuration in conf/flink-config.yaml. Please follow our configuration guide<https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html> if you want to change something.

Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address (because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and parallelism.default if the number of slots has been specified."

OK, so it will use conf/flink-config.yaml, except for jobmanager.rpc.address/port which will be decided by YARN and not necessarily reported to the user since those are dynamically allocated by YARN. That's fine with me, but if I want to make a "long-running" Flink cluster available for more than one user, where do I check in Flink for the Application Master hostname--or do I just have to scrape output of logs (which would definitely be undesirable)? First, I thought this would be written by Flink to conf/flink-config.yaml. It is not. Then I thought it must surely be written to the HDFS configuration directory (under something like hdfs://$USER/.flink/) for that application but that is merely copied from the original conf/flink-config.yaml and doesn't have an accurate configuration for the specified application. So is there an accurate config somewhere in HDFS or on the ResourceManager--i.e. where could I programmatically find that (outside of manipulating YARN app names or scraping)?

Thanks,
Craig




Re: Flink long-running YARN configuration

Posted by Maximilian Michels <mx...@apache.org>.
Yes, it will exist also in the Yarn session and continue to run across
jobs. Its address is also printed on the console when the cluster is
brought up.

On Mon, Aug 29, 2016 at 2:44 PM, Robert Metzger <rm...@apache.org> wrote:
> The JobManager UI starts when running Flink on YARN.
> The address of the UI is registered at YARN, so you can also access it
> through YARNs command line tools or its web interface.
>
> On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant <tr...@gmail.com>
> wrote:
>>
>> Stephan,
>>
>> Will the jobmanager-UI exist?  E.g. if I am running Flink on YARN will I
>> be able to submit apps/see logs and DAGs through the web interface?
>>
>> thanks,
>> tg
>>
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> "Fortunate is he, who is able to know the causes of things."  -Virgil
>>
>>
>> On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>> Hi Craig!
>>>
>>> For YARN sessions, Flink will
>>>   - (a) register the app master hostname/port/etc at Yarn, so you can get
>>> them from example from the yarn UI and tools
>>>   - (b) it will create a .yarn-properties file that contain the
>>> hostname/ports info. Future calls to the command line pick up the info from
>>> there.
>>>
>>> /cc Robert
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <fo...@amazon.com>
>>> wrote:
>>>>
>>>> I'm trying to understand Flink YARN configuration. The flink-conf.yaml
>>>> file is supposedly the way to configure Flink, except when you launch Flink
>>>> using YARN since that's determined for the AM. The following is
>>>> contradictory or not completely clear:
>>>>
>>>>
>>>>
>>>> "The system will use the configuration in conf/flink-config.yaml. Please
>>>> follow our configuration guide if you want to change something.
>>>>
>>>> Flink on YARN will overwrite the following configuration parameters
>>>> jobmanager.rpc.address (because the JobManager is always allocated at
>>>> different machines), taskmanager.tmp.dirs (we are using the tmp directories
>>>> given by YARN) and parallelism.default if the number of slots has been
>>>> specified."
>>>>
>>>>
>>>>
>>>> OK, so it will use conf/flink-config.yaml, except for
>>>> jobmanager.rpc.address/port which will be decided by YARN and not
>>>> necessarily reported to the user since those are dynamically allocated by
>>>> YARN. That's fine with me, but if I want to make a "long-running" Flink
>>>> cluster available for more than one user, where do I check in Flink for the
>>>> Application Master hostname--or do I just have to scrape output of logs
>>>> (which would definitely be undesirable)? First, I thought this would be
>>>> written by Flink to conf/flink-config.yaml. It is not. Then I thought it
>>>> must surely be written to the HDFS configuration directory (under something
>>>> like hdfs://$USER/.flink/) for that application but that is merely copied
>>>> from the original conf/flink-config.yaml and doesn't have an accurate
>>>> configuration for the specified application. So is there an accurate config
>>>> somewhere in HDFS or on the ResourceManager--i.e. where could I
>>>> programmatically find that (outside of manipulating YARN app names or
>>>> scraping)?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Craig
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Flink long-running YARN configuration

Posted by Robert Metzger <rm...@apache.org>.
The JobManager UI starts when running Flink on YARN.
The address of the UI is registered at YARN, so you can also access it
through YARNs command line tools or its web interface.

On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant <tr...@gmail.com>
wrote:

> Stephan,
>
> Will the jobmanager-UI exist?  E.g. if I am running Flink on YARN will I
> be able to submit apps/see logs and DAGs through the web interface?
>
> thanks,
> tg
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Craig!
>>
>> For YARN sessions, Flink will
>>   - (a) register the app master hostname/port/etc at Yarn, so you can get
>> them from example from the yarn UI and tools
>>   - (b) it will create a .yarn-properties file that contain the
>> hostname/ports info. Future calls to the command line pick up the info from
>> there.
>>
>> /cc Robert
>>
>> Greetings,
>> Stephan
>>
>>
>> On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <fo...@amazon.com>
>> wrote:
>>
>>> I'm trying to understand Flink YARN configuration. The flink-conf.yaml
>>> file is supposedly the way to configure Flink, except when you launch Flink
>>> using YARN since that's determined for the AM. The following is
>>> contradictory or not completely clear:
>>>
>>>
>>>
>>> "The system will use the configuration in conf/flink-config.yaml.
>>> Please follow our configuration guide
>>> <https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html>
>>>  if you want to change something.
>>>
>>> Flink on YARN will overwrite the following configuration parameters
>>> jobmanager.rpc.address (because the JobManager is always allocated at
>>> different machines), taskmanager.tmp.dirs (we are using the tmp
>>> directories given by YARN) and parallelism.default if the number of
>>> slots has been specified."
>>>
>>>
>>>
>>> OK, so it will use conf/flink-config.yaml, except for
>>> jobmanager.rpc.address/port which will be decided by YARN and not
>>> necessarily reported to the user since those are dynamically allocated by
>>> YARN. That's fine with me, but if I want to make a "long-running" Flink
>>> cluster available for more than one user, where do I check in Flink for the
>>> Application Master hostname--or do I just have to scrape output of logs
>>> (which would definitely be undesirable)? First, I thought this would be
>>> written by Flink to conf/flink-config.yaml. It is not. Then I thought it
>>> must surely be written to the HDFS configuration directory (under something
>>> like hdfs://$USER/.flink/) for that application but that is merely copied
>>> from the original conf/flink-config.yaml and doesn't have an accurate
>>> configuration for the specified application. So is there an accurate config
>>> somewhere in HDFS or on the ResourceManager--i.e. where could I
>>> programmatically find that (outside of manipulating YARN app names or
>>> scraping)?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Craig
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Flink long-running YARN configuration

Posted by Trevor Grant <tr...@gmail.com>.
Stephan,

Will the jobmanager-UI exist?  E.g. if I am running Flink on YARN will I be
able to submit apps/see logs and DAGs through the web interface?

thanks,
tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Craig!
>
> For YARN sessions, Flink will
>   - (a) register the app master hostname/port/etc at Yarn, so you can get
> them from example from the yarn UI and tools
>   - (b) it will create a .yarn-properties file that contain the
> hostname/ports info. Future calls to the command line pick up the info from
> there.
>
> /cc Robert
>
> Greetings,
> Stephan
>
>
> On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <fo...@amazon.com>
> wrote:
>
>> I'm trying to understand Flink YARN configuration. The flink-conf.yaml
>> file is supposedly the way to configure Flink, except when you launch Flink
>> using YARN since that's determined for the AM. The following is
>> contradictory or not completely clear:
>>
>>
>>
>> "The system will use the configuration in conf/flink-config.yaml. Please
>> follow our configuration guide
>> <https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html>
>>  if you want to change something.
>>
>> Flink on YARN will overwrite the following configuration parameters
>> jobmanager.rpc.address (because the JobManager is always allocated at
>> different machines), taskmanager.tmp.dirs (we are using the tmp
>> directories given by YARN) and parallelism.default if the number of
>> slots has been specified."
>>
>>
>>
>> OK, so it will use conf/flink-config.yaml, except for
>> jobmanager.rpc.address/port which will be decided by YARN and not
>> necessarily reported to the user since those are dynamically allocated by
>> YARN. That's fine with me, but if I want to make a "long-running" Flink
>> cluster available for more than one user, where do I check in Flink for the
>> Application Master hostname--or do I just have to scrape output of logs
>> (which would definitely be undesirable)? First, I thought this would be
>> written by Flink to conf/flink-config.yaml. It is not. Then I thought it
>> must surely be written to the HDFS configuration directory (under something
>> like hdfs://$USER/.flink/) for that application but that is merely copied
>> from the original conf/flink-config.yaml and doesn't have an accurate
>> configuration for the specified application. So is there an accurate config
>> somewhere in HDFS or on the ResourceManager--i.e. where could I
>> programmatically find that (outside of manipulating YARN app names or
>> scraping)?
>>
>>
>>
>> Thanks,
>>
>> Craig
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Flink long-running YARN configuration

Posted by Stephan Ewen <se...@apache.org>.
Hi Craig!

For YARN sessions, Flink will
  - (a) register the app master hostname/port/etc at Yarn, so you can get
them from example from the yarn UI and tools
  - (b) it will create a .yarn-properties file that contain the
hostname/ports info. Future calls to the command line pick up the info from
there.

/cc Robert

Greetings,
Stephan


On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <fo...@amazon.com> wrote:

> I'm trying to understand Flink YARN configuration. The flink-conf.yaml
> file is supposedly the way to configure Flink, except when you launch Flink
> using YARN since that's determined for the AM. The following is
> contradictory or not completely clear:
>
>
>
> "The system will use the configuration in conf/flink-config.yaml. Please
> follow our configuration guide
> <https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html>
>  if you want to change something.
>
> Flink on YARN will overwrite the following configuration parameters
> jobmanager.rpc.address (because the JobManager is always allocated at
> different machines), taskmanager.tmp.dirs (we are using the tmp
> directories given by YARN) and parallelism.default if the number of slots
> has been specified."
>
>
>
> OK, so it will use conf/flink-config.yaml, except for
> jobmanager.rpc.address/port which will be decided by YARN and not
> necessarily reported to the user since those are dynamically allocated by
> YARN. That's fine with me, but if I want to make a "long-running" Flink
> cluster available for more than one user, where do I check in Flink for the
> Application Master hostname--or do I just have to scrape output of logs
> (which would definitely be undesirable)? First, I thought this would be
> written by Flink to conf/flink-config.yaml. It is not. Then I thought it
> must surely be written to the HDFS configuration directory (under something
> like hdfs://$USER/.flink/) for that application but that is merely copied
> from the original conf/flink-config.yaml and doesn't have an accurate
> configuration for the specified application. So is there an accurate config
> somewhere in HDFS or on the ResourceManager--i.e. where could I
> programmatically find that (outside of manipulating YARN app names or
> scraping)?
>
>
>
> Thanks,
>
> Craig
>
>
>
>
>
>
>