You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dian Fu (Jira)" <ji...@apache.org> on 2022/06/20 01:09:00 UTC

[jira] [Closed] (FLINK-28114) The path of the Python client interpreter could not point to an archive file in distributed file system

     [ https://issues.apache.org/jira/browse/FLINK-28114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dian Fu closed FLINK-28114.
---------------------------
      Assignee: Dian Fu
    Resolution: Fixed

Fixed in:
- master via 6b04a50ae2182d4cdd8e44ea9a16171d1d2394ce
- release-1.15 via da05d0f3f6950dcf5e839bae0c396dbdf8a69e9e

> The path of the Python client interpreter could not point to an archive file in distributed file system
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28114
>                 URL: https://issues.apache.org/jira/browse/FLINK-28114
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>            Priority: Major
>             Fix For: 1.16.0, 1.15.1
>
>
> See https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L178 for more details about this limitation.
> Users could execute PyFlink jobs in YARN application mode as following:
> {code}
> ./bin/flink run-application -t yarn-application \
>       -Djobmanager.memory.process.size=1024m \
>       -Dtaskmanager.memory.process.size=1024m \
>       -Dyarn.application.name=<ApplicationName> \
>       -Dyarn.ship-files=/path/to/shipfiles \
>       -pyarch shipfiles/venv.zip \
>       -pyclientexec venv.zip/venv/bin/python3 \
>       -pyexec venv.zip/venv/bin/python3 \
>       -py shipfiles/word_count.py
> {code}
> In the above case, venv.zip will be distributed to the TMs via Flink blob server. However, blob server doesn't support files with size exceeding of 2GB. See https://github.com/apache/flink/blob/ea52732dc48a4f1c5be0925890cd8aa1ea2a11ed/flink-runtime/src/main/java/org/apache/flink/runtime/blob/BlobServerConnection.java#L223 for more details. This is very serious problem as Python users usually tend to install a lot Python libraries inside the venv.zip and some Python libraries are very large.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)