You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Richard Deurwaarder <ri...@xeli.eu> on 2019/09/24 15:01:50 UTC

Setting environment variables of the taskmanagers (yarn)

Hello,

We have our flink job (1.8.0) running on our hadoop 2.7 cluster with yarn.
We would like to add the GCS connector to use GCS rather than HDFS.
Following the documentation of the GCS connector[1] we have to specify
which credentials we want to use and there are two ways of doing this:
  * Edit core-site.xml
  * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS

Because we're on a company shared hadoop cluster we do not want to change
the cluster wide core-site.xml.

This leaves me with two options:

1. Create a custom core-site.xml and use --yarnship to send it to all the
taskmanager contains. If I do this, to what value should I set
fs.hdfs.hadoopconf[2] in flink-conf ?
2. The second option would be to set an environment variable, however
because the taskmanagers are started via yarn I'm having trouble figuring
out how to make sure this environment variable is set for each yarn
container / taskmanager.

I would appreciate any help you can provide.

Thank you,

Richard

[1]
https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#configure-hadoop
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#hdfs

Re: Setting environment variables of the taskmanagers (yarn)

Posted by Peter Huang <hu...@gmail.com>.
Hi Richard,

Good suggestion. I just created a Jira ticket. I will find a time this week
to update docs.



Best Regards
Peter Huang

On Wed, Sep 25, 2019 at 8:05 AM Richard Deurwaarder <ri...@xeli.eu> wrote:

> Hi Peter and Jiayi,
>
> Thanks for the answers this worked perfectly, I just added
>
> containerized.master.env.GOOGLE_APPLICATION_CREDENTIALS=xyz
> and
> containerized.taskmanager.env.GOOGLE_APPLICATION_CREDENTIALS=xyz
>
> to my flink config and they got picked up.
>
> Do you know why this is missing from the docs? If it's not intentional it
> might be nice to add it.
>
> Richard
>
> On Tue, Sep 24, 2019 at 5:53 PM Peter Huang <hu...@gmail.com>
> wrote:
>
>> Hi Richard,
>>
>> For the first question, I don't think you need to explicitly specify
>> fs.hdfs.hadoopconf as each file in the ship folder is copied as a yarn
>> local resource for containers. The configuration path is
>> overridden internally in Flink.
>>
>> For the second question of setting TM environment variables, please use
>> these two configurations in your flink conf.
>>
>> /**
>>  * Prefix for passing custom environment variables to Flink's master process.
>>  * For example for passing LD_LIBRARY_PATH as an env variable to the AppMaster, set:
>>  * containerized.master.env.LD_LIBRARY_PATH: "/usr/lib/native"
>>  * in the flink-conf.yaml.
>>  */
>> public static final String CONTAINERIZED_MASTER_ENV_PREFIX = "containerized.master.env.";
>>
>> /**
>>  * Similar to the {@see CONTAINERIZED_MASTER_ENV_PREFIX}, this configuration prefix allows
>>  * setting custom environment variables for the workers (TaskManagers).
>>  */
>> public static final String CONTAINERIZED_TASK_MANAGER_ENV_PREFIX = "containerized.taskmanager.env.";
>>
>>
>>
>> Best Regards
>>
>> Peter Huang
>>
>>
>>
>>
>> On Tue, Sep 24, 2019 at 8:02 AM Richard Deurwaarder <ri...@xeli.eu>
>> wrote:
>>
>>> Hello,
>>>
>>> We have our flink job (1.8.0) running on our hadoop 2.7 cluster with
>>> yarn. We would like to add the GCS connector to use GCS rather than HDFS.
>>> Following the documentation of the GCS connector[1] we have to specify
>>> which credentials we want to use and there are two ways of doing this:
>>>   * Edit core-site.xml
>>>   * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS
>>>
>>> Because we're on a company shared hadoop cluster we do not want to
>>> change the cluster wide core-site.xml.
>>>
>>> This leaves me with two options:
>>>
>>> 1. Create a custom core-site.xml and use --yarnship to send it to all
>>> the taskmanager contains. If I do this, to what value should I set
>>> fs.hdfs.hadoopconf[2] in flink-conf ?
>>> 2. The second option would be to set an environment variable, however
>>> because the taskmanagers are started via yarn I'm having trouble figuring
>>> out how to make sure this environment variable is set for each yarn
>>> container / taskmanager.
>>>
>>> I would appreciate any help you can provide.
>>>
>>> Thank you,
>>>
>>> Richard
>>>
>>> [1]
>>> https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#configure-hadoop
>>> [2]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#hdfs
>>>
>>

Re: Setting environment variables of the taskmanagers (yarn)

Posted by Richard Deurwaarder <ri...@xeli.eu>.
Hi Peter and Jiayi,

Thanks for the answers this worked perfectly, I just added

containerized.master.env.GOOGLE_APPLICATION_CREDENTIALS=xyz
and
containerized.taskmanager.env.GOOGLE_APPLICATION_CREDENTIALS=xyz

to my flink config and they got picked up.

Do you know why this is missing from the docs? If it's not intentional it
might be nice to add it.

Richard

On Tue, Sep 24, 2019 at 5:53 PM Peter Huang <hu...@gmail.com>
wrote:

> Hi Richard,
>
> For the first question, I don't think you need to explicitly specify
> fs.hdfs.hadoopconf as each file in the ship folder is copied as a yarn
> local resource for containers. The configuration path is
> overridden internally in Flink.
>
> For the second question of setting TM environment variables, please use
> these two configurations in your flink conf.
>
> /**
>  * Prefix for passing custom environment variables to Flink's master process.
>  * For example for passing LD_LIBRARY_PATH as an env variable to the AppMaster, set:
>  * containerized.master.env.LD_LIBRARY_PATH: "/usr/lib/native"
>  * in the flink-conf.yaml.
>  */
> public static final String CONTAINERIZED_MASTER_ENV_PREFIX = "containerized.master.env.";
>
> /**
>  * Similar to the {@see CONTAINERIZED_MASTER_ENV_PREFIX}, this configuration prefix allows
>  * setting custom environment variables for the workers (TaskManagers).
>  */
> public static final String CONTAINERIZED_TASK_MANAGER_ENV_PREFIX = "containerized.taskmanager.env.";
>
>
>
> Best Regards
>
> Peter Huang
>
>
>
>
> On Tue, Sep 24, 2019 at 8:02 AM Richard Deurwaarder <ri...@xeli.eu>
> wrote:
>
>> Hello,
>>
>> We have our flink job (1.8.0) running on our hadoop 2.7 cluster with
>> yarn. We would like to add the GCS connector to use GCS rather than HDFS.
>> Following the documentation of the GCS connector[1] we have to specify
>> which credentials we want to use and there are two ways of doing this:
>>   * Edit core-site.xml
>>   * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS
>>
>> Because we're on a company shared hadoop cluster we do not want to change
>> the cluster wide core-site.xml.
>>
>> This leaves me with two options:
>>
>> 1. Create a custom core-site.xml and use --yarnship to send it to all the
>> taskmanager contains. If I do this, to what value should I set
>> fs.hdfs.hadoopconf[2] in flink-conf ?
>> 2. The second option would be to set an environment variable, however
>> because the taskmanagers are started via yarn I'm having trouble figuring
>> out how to make sure this environment variable is set for each yarn
>> container / taskmanager.
>>
>> I would appreciate any help you can provide.
>>
>> Thank you,
>>
>> Richard
>>
>> [1]
>> https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#configure-hadoop
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#hdfs
>>
>

Re: Setting environment variables of the taskmanagers (yarn)

Posted by Peter Huang <hu...@gmail.com>.
Hi Richard,

For the first question, I don't think you need to explicitly specify
fs.hdfs.hadoopconf as each file in the ship folder is copied as a yarn
local resource for containers. The configuration path is
overridden internally in Flink.

For the second question of setting TM environment variables, please use
these two configurations in your flink conf.

/**
 * Prefix for passing custom environment variables to Flink's master process.
 * For example for passing LD_LIBRARY_PATH as an env variable to the
AppMaster, set:
 * containerized.master.env.LD_LIBRARY_PATH: "/usr/lib/native"
 * in the flink-conf.yaml.
 */
public static final String CONTAINERIZED_MASTER_ENV_PREFIX =
"containerized.master.env.";

/**
 * Similar to the {@see CONTAINERIZED_MASTER_ENV_PREFIX}, this
configuration prefix allows
 * setting custom environment variables for the workers (TaskManagers).
 */
public static final String CONTAINERIZED_TASK_MANAGER_ENV_PREFIX =
"containerized.taskmanager.env.";



Best Regards

Peter Huang




On Tue, Sep 24, 2019 at 8:02 AM Richard Deurwaarder <ri...@xeli.eu> wrote:

> Hello,
>
> We have our flink job (1.8.0) running on our hadoop 2.7 cluster with yarn.
> We would like to add the GCS connector to use GCS rather than HDFS.
> Following the documentation of the GCS connector[1] we have to specify
> which credentials we want to use and there are two ways of doing this:
>   * Edit core-site.xml
>   * Set an environment variable: GOOGLE_APPLICATION_CREDENTIALS
>
> Because we're on a company shared hadoop cluster we do not want to change
> the cluster wide core-site.xml.
>
> This leaves me with two options:
>
> 1. Create a custom core-site.xml and use --yarnship to send it to all the
> taskmanager contains. If I do this, to what value should I set
> fs.hdfs.hadoopconf[2] in flink-conf ?
> 2. The second option would be to set an environment variable, however
> because the taskmanagers are started via yarn I'm having trouble figuring
> out how to make sure this environment variable is set for each yarn
> container / taskmanager.
>
> I would appreciate any help you can provide.
>
> Thank you,
>
> Richard
>
> [1]
> https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#configure-hadoop
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#hdfs
>