You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2021/04/14 15:36:49 UTC

Flink Hadoop config on docker-compose

Hi everybody,
I'm trying to set up reading from HDFS using docker-compose and Flink
1.11.3.
If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
using FLINK_PROPERTIES (under environment section of the docker-compose
service) I see in the logs the following line:

"Could not find Hadoop configuration via any of the supported method"

If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
generated by the run scripts.
Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
environment section of the docker-compose service) I don't see that line.

Is this the expected behavior?

Below the relevant docker-compose service I use (I've removed the content
of HADOOP_CLASSPATH content because is too long and I didn't report the
taskmanager that is similar):

flink-jobmanager:
    container_name: flink-jobmanager
    build:
      context: .
      dockerfile: Dockerfile.flink
      args:
        FLINK_VERSION: 1.11.3-scala_2.12-java11
    image: 'flink-test:1.11.3-scala_2.12-java11'
    ports:
      - "8091:8081"
      - "8092:8082"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        rest.port: 8081
        historyserver.web.port: 8082
        web.upload.dir: /opt/flink
        env.hadoop.conf.dir: /opt/hadoop/conf
        env.yarn.conf.dir: /opt/hadoop/conf
      - |
        HADOOP_CLASSPATH=...
      - HADOOP_CONF_DIR=/opt/hadoop/conf
      - YARN_CONF_DIR=/opt/hadoop/conf
    volumes:
      - 'flink_shared_folder:/tmp/test'
      - 'flink_uploads:/opt/flink/flink-web-upload'
      - 'flink_hadoop_conf:/opt/hadoop/conf'
      - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'


Thanks in advance for any support,
Flavio

Re: Flink Hadoop config on docker-compose

Posted by Flavio Pompermaier <po...@okkam.it>.
Great! Thanks for the support

On Thu, Apr 22, 2021 at 2:57 PM Matthias Pohl <ma...@ververica.com>
wrote:

> I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks
> for bringing it up.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-22414
>
> On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi Yang,
>> isn't this something to fix? If I look at the documentation at  [1], in
>> the "Passing configuration via environment variables" section, there is:
>>
>> "The environment variable FLINK_PROPERTIES should contain a list of Flink
>> cluster configuration options separated by new line,
>> the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence
>> over configurations in flink-conf.yaml."
>>
>> To me this means that if I specify "env.hadoop.conf.dir" it should be
>> handled as well. Am I wrong?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/docker.html
>>
>> Best,
>> Flavio
>>
>> On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <da...@gmail.com> wrote:
>>
>>> It seems that we do not export HADOOP_CONF_DIR as environment variables
>>> in current implementation, even though we have set the env.xxx flink config
>>> options. It is only used to construct the classpath for the JM/TM process.
>>> However, in "HadoopUtils"[2] we do not support getting the hadoop
>>> configuration from classpath.
>>>
>>>
>>> [1].
>>> https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/bin/config.sh#L256
>>> [2].
>>> https://github.com/apache/flink/blob/release-1.11/flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/utils/HadoopUtils.java#L64
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> Best,
>>> Yang
>>>
>>> Flavio Pompermaier <po...@okkam.it> 于2021年4月16日周五 上午3:55写道:
>>>
>>>> Hi Robert,
>>>> indeed my docker-compose does work only if I add also Hadoop and yarn
>>>> home while I was expecting that those two variables were generated
>>>> automatically just setting env.xxx variables in FLINK_PROPERTIES variable..
>>>>
>>>> I just want to understand what to expect, if I really need to specify
>>>> Hadoop and yarn home as env variables or not
>>>>
>>>> Il gio 15 apr 2021, 20:39 Robert Metzger <rm...@apache.org> ha
>>>> scritto:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm not aware of any known issues with Hadoop and Flink on Docker.
>>>>>
>>>>> I also tried what you are doing locally, and it seems to work:
>>>>>
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
>>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
>>>>> StandaloneSessionClusterEntrypoint.
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
>>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>>>> default filesystem.
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
>>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>>>> security context.
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
>>>>>  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
>>>>> user set to flink (auth:SIMPLE)
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
>>>>>  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
>>>>> file will be created as /tmp/jaas-811306162058602256.conf.
>>>>> flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
>>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
>>>>> Initializing cluster services.
>>>>>
>>>>> Here's my code:
>>>>>
>>>>> https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39
>>>>>
>>>>> Hope this helps!
>>>>>
>>>>> On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <
>>>>> pompermaier@okkam.it> wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>> I'm trying to set up reading from HDFS using docker-compose and Flink
>>>>>> 1.11.3.
>>>>>> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
>>>>>> using FLINK_PROPERTIES (under environment section of the docker-compose
>>>>>> service) I see in the logs the following line:
>>>>>>
>>>>>> "Could not find Hadoop configuration via any of the supported method"
>>>>>>
>>>>>> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
>>>>>> generated by the run scripts.
>>>>>> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
>>>>>> environment section of the docker-compose service) I don't see that line.
>>>>>>
>>>>>> Is this the expected behavior?
>>>>>>
>>>>>> Below the relevant docker-compose service I use (I've removed the
>>>>>> content of HADOOP_CLASSPATH content because is too long and I didn't report
>>>>>> the taskmanager that is similar):
>>>>>>
>>>>>> flink-jobmanager:
>>>>>>     container_name: flink-jobmanager
>>>>>>     build:
>>>>>>       context: .
>>>>>>       dockerfile: Dockerfile.flink
>>>>>>       args:
>>>>>>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>>>>>>     image: 'flink-test:1.11.3-scala_2.12-java11'
>>>>>>     ports:
>>>>>>       - "8091:8081"
>>>>>>       - "8092:8082"
>>>>>>     command: jobmanager
>>>>>>     environment:
>>>>>>       - |
>>>>>>         FLINK_PROPERTIES=
>>>>>>         jobmanager.rpc.address: flink-jobmanager
>>>>>>         rest.port: 8081
>>>>>>         historyserver.web.port: 8082
>>>>>>         web.upload.dir: /opt/flink
>>>>>>         env.hadoop.conf.dir: /opt/hadoop/conf
>>>>>>         env.yarn.conf.dir: /opt/hadoop/conf
>>>>>>       - |
>>>>>>         HADOOP_CLASSPATH=...
>>>>>>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>>>>>>       - YARN_CONF_DIR=/opt/hadoop/conf
>>>>>>     volumes:
>>>>>>       - 'flink_shared_folder:/tmp/test'
>>>>>>       - 'flink_uploads:/opt/flink/flink-web-upload'
>>>>>>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>>>>>>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>>>>>>
>>>>>>
>>>>>> Thanks in advance for any support,
>>>>>> Flavio
>>>>>>
>>>>>
>

Re: Flink Hadoop config on docker-compose

Posted by Matthias Pohl <ma...@ververica.com>.
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks
for bringing it up.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-22414

On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi Yang,
> isn't this something to fix? If I look at the documentation at  [1], in
> the "Passing configuration via environment variables" section, there is:
>
> "The environment variable FLINK_PROPERTIES should contain a list of Flink
> cluster configuration options separated by new line,
> the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence
> over configurations in flink-conf.yaml."
>
> To me this means that if I specify "env.hadoop.conf.dir" it should be
> handled as well. Am I wrong?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/docker.html
>
> Best,
> Flavio
>
> On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <da...@gmail.com> wrote:
>
>> It seems that we do not export HADOOP_CONF_DIR as environment variables
>> in current implementation, even though we have set the env.xxx flink config
>> options. It is only used to construct the classpath for the JM/TM process.
>> However, in "HadoopUtils"[2] we do not support getting the hadoop
>> configuration from classpath.
>>
>>
>> [1].
>> https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/bin/config.sh#L256
>> [2].
>> https://github.com/apache/flink/blob/release-1.11/flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/utils/HadoopUtils.java#L64
>>
>>
>> Best,
>> Yang
>>
>> Best,
>> Yang
>>
>> Flavio Pompermaier <po...@okkam.it> 于2021年4月16日周五 上午3:55写道:
>>
>>> Hi Robert,
>>> indeed my docker-compose does work only if I add also Hadoop and yarn
>>> home while I was expecting that those two variables were generated
>>> automatically just setting env.xxx variables in FLINK_PROPERTIES variable..
>>>
>>> I just want to understand what to expect, if I really need to specify
>>> Hadoop and yarn home as env variables or not
>>>
>>> Il gio 15 apr 2021, 20:39 Robert Metzger <rm...@apache.org> ha
>>> scritto:
>>>
>>>> Hi,
>>>>
>>>> I'm not aware of any known issues with Hadoop and Flink on Docker.
>>>>
>>>> I also tried what you are doing locally, and it seems to work:
>>>>
>>>> flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
>>>> StandaloneSessionClusterEntrypoint.
>>>> flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>>> default filesystem.
>>>> flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>>> security context.
>>>> flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
>>>>  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
>>>> user set to flink (auth:SIMPLE)
>>>> flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
>>>>  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
>>>> file will be created as /tmp/jaas-811306162058602256.conf.
>>>> flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
>>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
>>>> Initializing cluster services.
>>>>
>>>> Here's my code:
>>>>
>>>> https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39
>>>>
>>>> Hope this helps!
>>>>
>>>> On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> Hi everybody,
>>>>> I'm trying to set up reading from HDFS using docker-compose and Flink
>>>>> 1.11.3.
>>>>> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
>>>>> using FLINK_PROPERTIES (under environment section of the docker-compose
>>>>> service) I see in the logs the following line:
>>>>>
>>>>> "Could not find Hadoop configuration via any of the supported method"
>>>>>
>>>>> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
>>>>> generated by the run scripts.
>>>>> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
>>>>> environment section of the docker-compose service) I don't see that line.
>>>>>
>>>>> Is this the expected behavior?
>>>>>
>>>>> Below the relevant docker-compose service I use (I've removed the
>>>>> content of HADOOP_CLASSPATH content because is too long and I didn't report
>>>>> the taskmanager that is similar):
>>>>>
>>>>> flink-jobmanager:
>>>>>     container_name: flink-jobmanager
>>>>>     build:
>>>>>       context: .
>>>>>       dockerfile: Dockerfile.flink
>>>>>       args:
>>>>>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>>>>>     image: 'flink-test:1.11.3-scala_2.12-java11'
>>>>>     ports:
>>>>>       - "8091:8081"
>>>>>       - "8092:8082"
>>>>>     command: jobmanager
>>>>>     environment:
>>>>>       - |
>>>>>         FLINK_PROPERTIES=
>>>>>         jobmanager.rpc.address: flink-jobmanager
>>>>>         rest.port: 8081
>>>>>         historyserver.web.port: 8082
>>>>>         web.upload.dir: /opt/flink
>>>>>         env.hadoop.conf.dir: /opt/hadoop/conf
>>>>>         env.yarn.conf.dir: /opt/hadoop/conf
>>>>>       - |
>>>>>         HADOOP_CLASSPATH=...
>>>>>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>>>>>       - YARN_CONF_DIR=/opt/hadoop/conf
>>>>>     volumes:
>>>>>       - 'flink_shared_folder:/tmp/test'
>>>>>       - 'flink_uploads:/opt/flink/flink-web-upload'
>>>>>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>>>>>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>>>>>
>>>>>
>>>>> Thanks in advance for any support,
>>>>> Flavio
>>>>>
>>>>

Re: Flink Hadoop config on docker-compose

Posted by Flavio Pompermaier <po...@okkam.it>.
Hi Yang,
isn't this something to fix? If I look at the documentation at  [1], in
the "Passing configuration via environment variables" section, there is:

"The environment variable FLINK_PROPERTIES should contain a list of Flink
cluster configuration options separated by new line,
the same way as in the flink-conf.yaml. FLINK_PROPERTIES takes precedence
over configurations in flink-conf.yaml."

To me this means that if I specify "env.hadoop.conf.dir" it should be
handled as well. Am I wrong?

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/docker.html

Best,
Flavio

On Fri, Apr 16, 2021 at 4:52 AM Yang Wang <da...@gmail.com> wrote:

> It seems that we do not export HADOOP_CONF_DIR as environment variables in
> current implementation, even though we have set the env.xxx flink config
> options. It is only used to construct the classpath for the JM/TM process.
> However, in "HadoopUtils"[2] we do not support getting the hadoop
> configuration from classpath.
>
>
> [1].
> https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/bin/config.sh#L256
> [2].
> https://github.com/apache/flink/blob/release-1.11/flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/utils/HadoopUtils.java#L64
>
>
> Best,
> Yang
>
> Best,
> Yang
>
> Flavio Pompermaier <po...@okkam.it> 于2021年4月16日周五 上午3:55写道:
>
>> Hi Robert,
>> indeed my docker-compose does work only if I add also Hadoop and yarn
>> home while I was expecting that those two variables were generated
>> automatically just setting env.xxx variables in FLINK_PROPERTIES variable..
>>
>> I just want to understand what to expect, if I really need to specify
>> Hadoop and yarn home as env variables or not
>>
>> Il gio 15 apr 2021, 20:39 Robert Metzger <rm...@apache.org> ha
>> scritto:
>>
>>> Hi,
>>>
>>> I'm not aware of any known issues with Hadoop and Flink on Docker.
>>>
>>> I also tried what you are doing locally, and it seems to work:
>>>
>>> flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
>>> StandaloneSessionClusterEntrypoint.
>>> flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>> default filesystem.
>>> flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>>> security context.
>>> flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
>>>  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
>>> user set to flink (auth:SIMPLE)
>>> flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
>>>  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
>>> file will be created as /tmp/jaas-811306162058602256.conf.
>>> flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
>>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
>>> Initializing cluster services.
>>>
>>> Here's my code:
>>>
>>> https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39
>>>
>>> Hope this helps!
>>>
>>> On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <po...@okkam.it>
>>> wrote:
>>>
>>>> Hi everybody,
>>>> I'm trying to set up reading from HDFS using docker-compose and Flink
>>>> 1.11.3.
>>>> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
>>>> using FLINK_PROPERTIES (under environment section of the docker-compose
>>>> service) I see in the logs the following line:
>>>>
>>>> "Could not find Hadoop configuration via any of the supported method"
>>>>
>>>> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
>>>> generated by the run scripts.
>>>> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
>>>> environment section of the docker-compose service) I don't see that line.
>>>>
>>>> Is this the expected behavior?
>>>>
>>>> Below the relevant docker-compose service I use (I've removed the
>>>> content of HADOOP_CLASSPATH content because is too long and I didn't report
>>>> the taskmanager that is similar):
>>>>
>>>> flink-jobmanager:
>>>>     container_name: flink-jobmanager
>>>>     build:
>>>>       context: .
>>>>       dockerfile: Dockerfile.flink
>>>>       args:
>>>>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>>>>     image: 'flink-test:1.11.3-scala_2.12-java11'
>>>>     ports:
>>>>       - "8091:8081"
>>>>       - "8092:8082"
>>>>     command: jobmanager
>>>>     environment:
>>>>       - |
>>>>         FLINK_PROPERTIES=
>>>>         jobmanager.rpc.address: flink-jobmanager
>>>>         rest.port: 8081
>>>>         historyserver.web.port: 8082
>>>>         web.upload.dir: /opt/flink
>>>>         env.hadoop.conf.dir: /opt/hadoop/conf
>>>>         env.yarn.conf.dir: /opt/hadoop/conf
>>>>       - |
>>>>         HADOOP_CLASSPATH=...
>>>>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>>>>       - YARN_CONF_DIR=/opt/hadoop/conf
>>>>     volumes:
>>>>       - 'flink_shared_folder:/tmp/test'
>>>>       - 'flink_uploads:/opt/flink/flink-web-upload'
>>>>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>>>>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>>>>
>>>>
>>>> Thanks in advance for any support,
>>>> Flavio
>>>>
>>>

Re: Flink Hadoop config on docker-compose

Posted by Yang Wang <da...@gmail.com>.
It seems that we do not export HADOOP_CONF_DIR as environment variables in
current implementation, even though we have set the env.xxx flink config
options. It is only used to construct the classpath for the JM/TM process.
However, in "HadoopUtils"[2] we do not support getting the hadoop
configuration from classpath.


[1].
https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/bin/config.sh#L256
[2].
https://github.com/apache/flink/blob/release-1.11/flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/utils/HadoopUtils.java#L64


Best,
Yang

Best,
Yang

Flavio Pompermaier <po...@okkam.it> 于2021年4月16日周五 上午3:55写道:

> Hi Robert,
> indeed my docker-compose does work only if I add also Hadoop and yarn home
> while I was expecting that those two variables were generated automatically
> just setting env.xxx variables in FLINK_PROPERTIES variable..
>
> I just want to understand what to expect, if I really need to specify
> Hadoop and yarn home as env variables or not
>
> Il gio 15 apr 2021, 20:39 Robert Metzger <rm...@apache.org> ha scritto:
>
>> Hi,
>>
>> I'm not aware of any known issues with Hadoop and Flink on Docker.
>>
>> I also tried what you are doing locally, and it seems to work:
>>
>> flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
>> StandaloneSessionClusterEntrypoint.
>> flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>> default filesystem.
>> flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
>> security context.
>> flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
>>  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
>> user set to flink (auth:SIMPLE)
>> flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
>>  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
>> file will be created as /tmp/jaas-811306162058602256.conf.
>> flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
>> Initializing cluster services.
>>
>> Here's my code:
>>
>> https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39
>>
>> Hope this helps!
>>
>> On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>>> Hi everybody,
>>> I'm trying to set up reading from HDFS using docker-compose and Flink
>>> 1.11.3.
>>> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
>>> using FLINK_PROPERTIES (under environment section of the docker-compose
>>> service) I see in the logs the following line:
>>>
>>> "Could not find Hadoop configuration via any of the supported method"
>>>
>>> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
>>> generated by the run scripts.
>>> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
>>> environment section of the docker-compose service) I don't see that line.
>>>
>>> Is this the expected behavior?
>>>
>>> Below the relevant docker-compose service I use (I've removed the
>>> content of HADOOP_CLASSPATH content because is too long and I didn't report
>>> the taskmanager that is similar):
>>>
>>> flink-jobmanager:
>>>     container_name: flink-jobmanager
>>>     build:
>>>       context: .
>>>       dockerfile: Dockerfile.flink
>>>       args:
>>>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>>>     image: 'flink-test:1.11.3-scala_2.12-java11'
>>>     ports:
>>>       - "8091:8081"
>>>       - "8092:8082"
>>>     command: jobmanager
>>>     environment:
>>>       - |
>>>         FLINK_PROPERTIES=
>>>         jobmanager.rpc.address: flink-jobmanager
>>>         rest.port: 8081
>>>         historyserver.web.port: 8082
>>>         web.upload.dir: /opt/flink
>>>         env.hadoop.conf.dir: /opt/hadoop/conf
>>>         env.yarn.conf.dir: /opt/hadoop/conf
>>>       - |
>>>         HADOOP_CLASSPATH=...
>>>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>>>       - YARN_CONF_DIR=/opt/hadoop/conf
>>>     volumes:
>>>       - 'flink_shared_folder:/tmp/test'
>>>       - 'flink_uploads:/opt/flink/flink-web-upload'
>>>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>>>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>>>
>>>
>>> Thanks in advance for any support,
>>> Flavio
>>>
>>

Re: Flink Hadoop config on docker-compose

Posted by Flavio Pompermaier <po...@okkam.it>.
Hi Robert,
indeed my docker-compose does work only if I add also Hadoop and yarn home
while I was expecting that those two variables were generated automatically
just setting env.xxx variables in FLINK_PROPERTIES variable..

I just want to understand what to expect, if I really need to specify
Hadoop and yarn home as env variables or not

Il gio 15 apr 2021, 20:39 Robert Metzger <rm...@apache.org> ha scritto:

> Hi,
>
> I'm not aware of any known issues with Hadoop and Flink on Docker.
>
> I also tried what you are doing locally, and it seems to work:
>
> flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
> StandaloneSessionClusterEntrypoint.
> flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
> default filesystem.
> flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
> security context.
> flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
>  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
> user set to flink (auth:SIMPLE)
> flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
>  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
> file will be created as /tmp/jaas-811306162058602256.conf.
> flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
> Initializing cluster services.
>
> Here's my code:
>
> https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39
>
> Hope this helps!
>
> On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi everybody,
>> I'm trying to set up reading from HDFS using docker-compose and Flink
>> 1.11.3.
>> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
>> using FLINK_PROPERTIES (under environment section of the docker-compose
>> service) I see in the logs the following line:
>>
>> "Could not find Hadoop configuration via any of the supported method"
>>
>> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
>> generated by the run scripts.
>> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
>> environment section of the docker-compose service) I don't see that line.
>>
>> Is this the expected behavior?
>>
>> Below the relevant docker-compose service I use (I've removed the content
>> of HADOOP_CLASSPATH content because is too long and I didn't report the
>> taskmanager that is similar):
>>
>> flink-jobmanager:
>>     container_name: flink-jobmanager
>>     build:
>>       context: .
>>       dockerfile: Dockerfile.flink
>>       args:
>>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>>     image: 'flink-test:1.11.3-scala_2.12-java11'
>>     ports:
>>       - "8091:8081"
>>       - "8092:8082"
>>     command: jobmanager
>>     environment:
>>       - |
>>         FLINK_PROPERTIES=
>>         jobmanager.rpc.address: flink-jobmanager
>>         rest.port: 8081
>>         historyserver.web.port: 8082
>>         web.upload.dir: /opt/flink
>>         env.hadoop.conf.dir: /opt/hadoop/conf
>>         env.yarn.conf.dir: /opt/hadoop/conf
>>       - |
>>         HADOOP_CLASSPATH=...
>>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>>       - YARN_CONF_DIR=/opt/hadoop/conf
>>     volumes:
>>       - 'flink_shared_folder:/tmp/test'
>>       - 'flink_uploads:/opt/flink/flink-web-upload'
>>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>>
>>
>> Thanks in advance for any support,
>> Flavio
>>
>

Re: Flink Hadoop config on docker-compose

Posted by Robert Metzger <rm...@apache.org>.
Hi,

I'm not aware of any known issues with Hadoop and Flink on Docker.

I also tried what you are doing locally, and it seems to work:

flink-jobmanager    | 2021-04-15 18:37:48,300 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
StandaloneSessionClusterEntrypoint.
flink-jobmanager    | 2021-04-15 18:37:48,338 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
default filesystem.
flink-jobmanager    | 2021-04-15 18:37:48,375 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install
security context.
flink-jobmanager    | 2021-04-15 18:37:48,404 INFO
 org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop
user set to flink (auth:SIMPLE)
flink-jobmanager    | 2021-04-15 18:37:48,408 INFO
 org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas
file will be created as /tmp/jaas-811306162058602256.conf.
flink-jobmanager    | 2021-04-15 18:37:48,415 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
Initializing cluster services.

Here's my code:

https://gist.github.com/rmetzger/0cf4ba081d685d26478525bf69c7bd39

Hope this helps!

On Wed, Apr 14, 2021 at 5:37 PM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi everybody,
> I'm trying to set up reading from HDFS using docker-compose and Flink
> 1.11.3.
> If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir'
> using FLINK_PROPERTIES (under environment section of the docker-compose
> service) I see in the logs the following line:
>
> "Could not find Hadoop configuration via any of the supported method"
>
> If I'm not wrong, this means that the HADOOP_CONF_DIR is actually not
> generated by the run scripts.
> Indeed, If I add HADOOP_CONF_DIR and YARN_CONF_DIR (always under
> environment section of the docker-compose service) I don't see that line.
>
> Is this the expected behavior?
>
> Below the relevant docker-compose service I use (I've removed the content
> of HADOOP_CLASSPATH content because is too long and I didn't report the
> taskmanager that is similar):
>
> flink-jobmanager:
>     container_name: flink-jobmanager
>     build:
>       context: .
>       dockerfile: Dockerfile.flink
>       args:
>         FLINK_VERSION: 1.11.3-scala_2.12-java11
>     image: 'flink-test:1.11.3-scala_2.12-java11'
>     ports:
>       - "8091:8081"
>       - "8092:8082"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: flink-jobmanager
>         rest.port: 8081
>         historyserver.web.port: 8082
>         web.upload.dir: /opt/flink
>         env.hadoop.conf.dir: /opt/hadoop/conf
>         env.yarn.conf.dir: /opt/hadoop/conf
>       - |
>         HADOOP_CLASSPATH=...
>       - HADOOP_CONF_DIR=/opt/hadoop/conf
>       - YARN_CONF_DIR=/opt/hadoop/conf
>     volumes:
>       - 'flink_shared_folder:/tmp/test'
>       - 'flink_uploads:/opt/flink/flink-web-upload'
>       - 'flink_hadoop_conf:/opt/hadoop/conf'
>       - 'flink_hadoop_libs:/opt/hadoop-3.2.1/share'
>
>
> Thanks in advance for any support,
> Flavio
>