You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "harrymyburgh (via GitHub)" <gi...@apache.org> on 2024/03/18 14:52:38 UTC

[I] Beam errors out when using PortableRunner (Flink Runner) – `Cannot run program "docker"` [beam]

harrymyburgh opened a new issue, #30663:
URL: https://github.com/apache/beam/issues/30663

   ### What happened?
   
   I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster. 
   I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using `ReadFromKafka` from `apache_beam.io.kafka`.
   
   My Flink Cluster is managed by the Apache Flink Kubernetes Operator.
   
   My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image `apache/beam_flink1.16_job_server:2.54.0`.
   
   My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image `apache/beam_python3.11_sdk:2.54.0` and the arg `--worker_pool`.
   
   When I start my beam job, I get the following error on the job manager logs:
   ```
   Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
   ```
   
   These are my Beam pipeline options:
   ```
   --job_name=beam_example_pipeline
   --runner=PortableRunner
   --job_endpoint=beam-flink-job-server:8099
   --artifact_endpoint=beam-flink-job-server:8098
   --environment_type=EXTERNAL
   --environment_config=localhost:50000
   --parallelism=1
   --streaming
   ```
   
   [Some resources I've found](https://lists.apache.org/thread/4qr4dlg5h8kplq728cfwl1vcqfqv3zf6) suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) `--environment_type=DOCKER`, which is what causes the issues. However, I could be wrong, so please say so if I am.
   
   All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam?
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [X] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org