You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/11/16 23:11:10 UTC

[GitHub] [flink] mattfysh commented on pull request #21254: Update docker.md

mattfysh commented on PR #21254:
URL: https://github.com/apache/flink/pull/21254#issuecomment-1317799085

   Hi @MartijnVisser - my apologies, I am short on time and hoped someone with PyFlink familiarity would pick this up and immediately identify why this change is required.
   
   The current instructions do not work in Docker, which means they won't work on anyone's machine regardless of host setup
   
   To reproduce, stand up a local cluster in session mode using the following steps:
   
   1. Create a new folder
   2. Create a Dockerfile inside the folder with the contents of "Using Flink Python on Docker": https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker
   3. Create a docker-compose.yaml file with the contents of "Session Mode": https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml
   4. Change both instances of `image: flink:1.16.0-scala_2.12` to be `build: .`
   5. Add a volumes entry to the `jobmanager` as:
   
           volumes:
               - .:/input
   
   6. Create a word_count.py file with the contents of "The complete code so far" section https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/python/table_api_tutorial/
   7. Run `docker-compose up`
   8. Enter the jobmanager via `docker exec -it [job_manager_container_id] bash`
   9. Run `./bin/flink run --python /input/word_count.py`
   
   This does not work, and throws the following error:
   
   ```
   Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/fastavro/read.py", line 2, in <module>
       from . import _read
     File "fastavro/_read.pyx", line 11, in init fastavro._read
     File "/usr/local/lib/python3.7/lzma.py", line 27, in <module>
       from _lzma import *
   ModuleNotFoundError: No module named '_lzma
   ```
   
   This is occurring because when building Python from source, as instructed in the Flink documentation I have updated, certain "optional modules" in Python are not built if host dependencies could not be found. This includes things like readline, sqlite, etc, but also lzma - more information can be found at https://devguide.python.org/getting-started/setup-building/#unix particularly the "optional modules were not found" message
   
   From what I gather, the latest version of PyFlink uses a version of fastavro that requires lzma to be present in the Python build. I assume the instructions I have updated used to work with older version of PyFlink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org