You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Robert Lugg <Ro...@synopsys.com> on 2019/11/07 18:14:35 UTC

Beam with Flink without containers

Hi All,

I have a conceptual misunderstanding which is keeping me from running BEAM using Flink.  The main killer right now is that I cannot use Docker containers.  I would like to run using something like:


python -m wordcount_flink \

--input war_and_peace.txt \

--runner=PortableRunner \

--job_endpoint=my-host:8099 \

--output result_flink

The first thing I did was start up a Flink cluster using ${FLINK_ROOT}/bin/start-cluster.sh.  The website confirms I have workers running.

Then, I started up a JobServer:

./gradlew :runners:flink:1.8:job-server:runShadow \

    -PflinkMasterUrl=my-host:8082

Finally, I attempted to run the wordcount sample:

python -m wordcount_flink \

--input war_and_peace.txt \

--runner=PortableRunner \

--job_endpoint=my-host:8099 \

--output result_flink

This failed with:
... java.io.IOException: Received exit code 126 for command 'docker run -d --network=host --env=DOCKER_MAC_CONTAINER=null apachebeam/python3.6_sdk:2.16.0 --id=1-1 --logging_endpoint=localhost:39858 --artifact_endpoint=localhost:39422 --provision_endpoint=localhost:40678 --control_endpoint=localhost:35088'
... docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.soc

I want to instead use the FlinkRunner and environment_type=PROCESS and environment_conf=????  What do I need to do to do this?

Generally, I'd like to have some sort of state diagram to describe who calls what when, if anything like that is available.

Thank you for any guidance.





Please note: I have cross posted this on stackoverflow: https://stackoverflow.com/questions/58718098/what-environment-config-for-beam-launching-flink
If I get an answer here, I'll enter it there (or you can).

Regards.

Re: Beam with Flink without containers

Posted by Ankur Goenka <go...@google.com>.
Hi Robert,

Different environment types are used to execute user code in a portable
fashion. These environments are started and managed by beam layer on flink
and are started on workers.
You need to create a script which honors beam portability api. The config
contains the script name.
You can use [1] to create a script which can be used for with environment
type process.
Please refer to [2] for the corresponding environment_config.

If you are using this configuration for testing, then I would recommend
using loopback worker by setting environment_type=LOOPBACK

[1]
https://github.com/apache/beam/blob/99484db34914812b3337fb1911b428c4331756c0/sdks/python/test-suites/portable/py2/build.gradle#L149
[2]
https://github.com/apache/beam/blob/99484db34914812b3337fb1911b428c4331756c0/sdks/python/test-suites/portable/common.gradle#L41

On Thu, Nov 7, 2019 at 10:15 AM Robert Lugg <Ro...@synopsys.com>
wrote:

> Hi All,
>
>
>
> I have a conceptual misunderstanding which is keeping me from running BEAM
> using Flink.  The main killer right now is that I cannot use Docker
> containers.  I would like to run using something like:
>
>
>
> python -m wordcount_flink \
>
> --input war_and_peace.txt \
>
> --runner=PortableRunner \
>
> --job_endpoint=my-host:8099 \
>
> --output result_flink
>
>
>
> The first thing I did was start up a Flink cluster using
> ${FLINK_ROOT}/bin/start-cluster.sh.  The website confirms I have workers
> running.
>
>
>
> Then, I started up a JobServer:
>
> ./gradlew :runners:flink:1.8:job-server:*runShadow* \
>
>     -PflinkMasterUrl=my-host:8082
>
>
>
> Finally, I attempted to run the wordcount sample:
>
> python -m wordcount_flink \
>
> --input war_and_peace.txt \
>
> --runner=PortableRunner \
>
> --job_endpoint=my-host:8099 \
>
> --output result_flink
>
>
>
> This failed with:
>
> … java.io.IOException: Received exit code 126 for command 'docker run -d
> --network=host --env=DOCKER_MAC_CONTAINER=null
> apachebeam/python3.6_sdk:2.16.0 --id=1-1 --logging_endpoint=localhost:39858
> --artifact_endpoint=localhost:39422 --provision_endpoint=localhost:40678
> --control_endpoint=localhost:35088'
>
> … docker: Got permission denied while trying to connect to the Docker
> daemon socket at unix:///var/run/docker.soc
>
>
>
> I want to instead use the FlinkRunner and environment_type=PROCESS and
> environment_conf=????  What do I need to do to do this?
>
>
>
> Generally, I’d like to have some sort of state diagram to describe who
> calls what when, if anything like that is available.
>
>
>
> Thank you for any guidance.
>
>
>
>
>
>
>
>
>
>
>
> Please note: I have cross posted this on stackoverflow:
> https://stackoverflow.com/questions/58718098/what-environment-config-for-beam-launching-flink
>
> If I get an answer here, I’ll enter it there (or you can).
>
>
>
> Regards.
>

Re: Beam with Flink without containers

Posted by Kyle Weaver <kc...@google.com>.
Hi Robert,


I replied to your main question on SO. (This is becoming a frequently asked
question, so we're going to look for ways to improve or at least document
this process. Stay tuned.)


> Generally, I’d like to have some sort of state diagram to describe who
calls what when, if anything like that is available.


The command specified in environment_config should be able to handle any
work Beam sends its way, so I don't think such a feature would be
necessary. But then again, I'm not totally sure what you mean here?

On Thu, Nov 7, 2019 at 10:15 AM Robert Lugg <Ro...@synopsys.com>
wrote:

> Hi All,
>
>
>
> I have a conceptual misunderstanding which is keeping me from running BEAM
> using Flink.  The main killer right now is that I cannot use Docker
> containers.  I would like to run using something like:
>
>
>
> python -m wordcount_flink \
>
> --input war_and_peace.txt \
>
> --runner=PortableRunner \
>
> --job_endpoint=my-host:8099 \
>
> --output result_flink
>
>
>
> The first thing I did was start up a Flink cluster using
> ${FLINK_ROOT}/bin/start-cluster.sh.  The website confirms I have workers
> running.
>
>
>
> Then, I started up a JobServer:
>
> ./gradlew :runners:flink:1.8:job-server:*runShadow* \
>
>     -PflinkMasterUrl=my-host:8082
>
>
>
> Finally, I attempted to run the wordcount sample:
>
> python -m wordcount_flink \
>
> --input war_and_peace.txt \
>
> --runner=PortableRunner \
>
> --job_endpoint=my-host:8099 \
>
> --output result_flink
>
>
>
> This failed with:
>
> … java.io.IOException: Received exit code 126 for command 'docker run -d
> --network=host --env=DOCKER_MAC_CONTAINER=null
> apachebeam/python3.6_sdk:2.16.0 --id=1-1 --logging_endpoint=localhost:39858
> --artifact_endpoint=localhost:39422 --provision_endpoint=localhost:40678
> --control_endpoint=localhost:35088'
>
> … docker: Got permission denied while trying to connect to the Docker
> daemon socket at unix:///var/run/docker.soc
>
>
>
> I want to instead use the FlinkRunner and environment_type=PROCESS and
> environment_conf=????  What do I need to do to do this?
>
>
>
> Generally, I’d like to have some sort of state diagram to describe who
> calls what when, if anything like that is available.
>
>
>
> Thank you for any guidance.
>
>
>
>
>
>
>
>
>
>
>
> Please note: I have cross posted this on stackoverflow:
> https://stackoverflow.com/questions/58718098/what-environment-config-for-beam-launching-flink
>
> If I get an answer here, I’ll enter it there (or you can).
>
>
>
> Regards.
>