You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2020/08/11 03:09:00 UTC
[jira] [Commented] (FLINK-18738) Revisit resource management model for python processes.

    [ https://issues.apache.org/jira/browse/FLINK-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175189#comment-17175189 ] 

Xintong Song commented on FLINK-18738:
--------------------------------------

Hi all,

[~dianfu] and I had an offline discussion regarding the process model and resource management for python UDFs. Here are our outcomes and some open questions. We would like to collect some feedbacks on the general direction before diving into the design details.

h3. Process Model

Before discussing the memory management, it would be better to first get consensus on the long term process model for python UDFs. There are several options from our offline discussion and the previous discussions in FLINK-17923.
# *One python process per python operator.* This is the current approach. The operator is responsible for launching and terminating the python processes.
# *One python process per slot.* TaskManager is responsible for launching the python processes. A python process will be launched when the slot is created (allocated), and terminated when the slot is destroyed (freed).
# *One python process per TaskManager.* The deployment framework is responsible for launching the python processes. Then the python operators (in the java process) deploy the workload to the python processes.

Among the 3 options above, *Dian and I are in favor of option 2).*

*Problems for option 1)*
Low efficiency. In case of multiple python operators in one slot, launching one python process per operator will introduce significant overhead (framework, python VM, inter-process communication). In scenarios where the operators themselves do not take much resources, the problem become severer because the overhead takes more proportion in the overall resource consumption.

*Problems for option 3)*
Dependency conflict. Python operators from different jobs might be deployed into the same TaskManager. These operators may need to load different dependencies. If they are executed in the same python process, there could be dependency conflicts.

_Open questions_
* According to Dian’s input, python does not provide mechanism for dependency isolation (like class loaders in java). We need to double check on this.
* How do we handle potential conflicts between the framework and user code dependencies?

*Benefits for options 2)*
* Operators in the same slot would be able to share the python process. This should help reduce the overheads.
* A slot cannot be shared by multiple jobs, thus no need to worry about cross-job dependency conflicts.

h3. Memory Management

The discussion here is based on the assumption that we choose option 2) for the process model, which is still discussable.

Since python processes are dynamically launched and terminated, as slots created and destroyed, we would need the TaskManager rather than the deployment framework to managed the resources of python processes. Two potential approaches are discussed.

# *Make python processes use managed memory.* We would need a proper way to share managed memory between python processes and rocksdb state backend in streaming scenarios.
# *Introduce a new `python memory` to the TaskManager memory model for python processes.* The new python memory should adding to the overall pod/container memory, either aside from or as a part of TaskManager's total process memory.

*Dian and I prefer option 2),* for the following reasons.
* For option 1), it would be complicated to decide how to share managed memory before python and rocksdb. E.g., if user wants to give more memory to rocksdb while not changing the memory for python, he would need to not only increase the managed memory size, but also carefully tune how managed memory is shared (e.g., a fraction).
* According to Dian's input, it is preferred to configure absolute size of memory for python UDFs, rather than a fraction of the total memory. Managed memory consumers (batch operators and rocksdb) have a common characteristic that they can to same extend adapt to the given memory. The more memory, the better performance. On the other hand, resource requirements of python UDFs are more inflexible. The process fails if it needs more memory than the specified limit, and does not benefit from a larger-than-needed limit.

h3. Developing Plan

Assuming we decide to go along the proposed approaches
* process model option 2), and
* memory management option 2)

It would be good to separate these changes into two separated efforts. Trying to accomplish both efforts in 1.12 seems aggressive and we would like to avoid such rushing. Among the two efforts, the memory management changing is more user-faced. If we decide to change memory configurations for python UDFs, we'd better to do that early. Therefore, a potential feasible plan could be try to finish the memory management effort in 1.12, and postpone the process model changes to the next release.

_Open question_
* We are stilling looking for a plan to make the proposed new memory management option 2) work with the current process model option 1).


> Revisit resource management model for python processes.
> -------------------------------------------------------
>
>                 Key: FLINK-18738
>                 URL: https://issues.apache.org/jira/browse/FLINK-18738
>             Project: Flink
>          Issue Type: Task
>          Components: API / Python, Runtime / Coordination
>            Reporter: Xintong Song
>            Assignee: Xintong Song
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This ticket is for tracking the effort towards a proper long-term resource management model for python processes.
> In FLINK-17923, we run into problems due to python processes are not well integrate with the task manager resource management mechanism. A temporal workaround has been merged for release-1.11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)