You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 17:17:37 UTC

[GitHub] [beam] damccorm opened a new issue, #20424: Incorrect Flink runner documentation

damccorm opened a new issue, #20424:
URL: https://github.com/apache/beam/issues/20424

As per the documentation at [https://beam.apache.org/documentation/runners/flink/](https://beam.apache.org/documentation/runners/flink/) under _"Portable (Java/Python/Go)"_, a containerized flink job server needs to be started using
```

docker run --net=host apache/beam_flink1.10_job_server:latest

```

or
```

docker run --net=host apache/beam_flink1.10_job_server:latest --flink-master=localhost:8081

```

If any of the SDKs are run using the DOCKER flag, all crash. As explained by [~danoliveira] – _"This command is building and running it locally on your machine. I'm not 100% sure why running it in a container is causing the error, but my suspicion is that it has to do with writing the manifest/artifact files to disk. One thing the job server does is writing artifacts to disk and then sending the locations to the SDK harness so it can read them. If the job server is in a container, then its probably writing the files to the container instead of your local machine, so they're inaccessible to the SDK harness."_ In fact, [~lostluck] tracked this to an already existing issue https://issues.apache.org/jira/browse/BEAM-5273 which is yet to be resolved and addresses this exact problem. Using Daniel's advice, Go SDK (and others I'm certain) can be run in DOCKER mode if the flink job server is started locally using gradle as follows –
```

./gradlew :runners:flink:1.10:job-server:runShadow -Djob-host=localhost -Dflink-master=local
```

Only if the SDK is run using the LOOPBACK flag does it manage to run on a containerized flink cluster. Moreoever since the LOOPBACK flag is explicitly meant for *local development* purposes only, this makes me wonder how folks are deploying their production beam data pipelines on flink (especially on managed services like Kubernetes). Overall, the main issue (at least until BEAM-5273 is unresolved) is the fact that beam's documentation fails to mention these caveats explicitly.

Imported from Jira [BEAM-10793](https://issues.apache.org/jira/browse/BEAM-10793). Original Jira may contain additional context.
Reported by: kevinsijo.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org