You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Jingsong Lee <lz...@apache.org> on 2019/12/02 07:47:01 UTC

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Hi Dian:


Thanks for your driving. I have some questions:


- Where should these configurations belong? You have mentioned tableApi/SQL,
so should in TableConfig?

- If just in table/sql, whether it should be called: table.python.****,
because in table, all config options are called table.***.

- What should table module do? So in CommonPythonCalc, we should read
options from table config, and set resources to OneInputTransformation?

- Are all buffer.memory off-heap memory? I took a look
to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
this one a heap queue? So we need heap memory too?


Hope to get your reply.


Best,

Jingsong Lee

On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com> wrote:

> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
> offline and also on the design doc.
>
> It seems that we have reached consensus on the design. I would bring up
> the VOTE if there is no other feedbacks.
>
> Thanks,
> Dian
>
> > 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
> >
> > Thanks a lot for putting this together, Dian! Definitely +1 for this!
> > It is great to make sure that the resources used by the Python process
> are
> > managed properly by Flink’s resource management framework.
> >
> > Also, thanks to the guys that working on the unified memory management
> > framework.
> >
> > Best, Hequn
> >
> >
> > On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com> wrote:
> >
> >> Thanks for driving this discussion, Dian!
> >>
> >> +1 for this proposal. It will help to reduce container failure due to
> >> the memory overuse.
> >> Some comments left in the design doc.
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <to...@gmail.com>
> >> wrote:
> >>>
> >>> Sorry for the late reply.
> >>>
> >>> +1 for the general proposal.
> >>>
> >>> And one remainder, to use UNKNOWN resource requirement, we need to make
> >>> sure optimizer knowns which operators use off-heap managed memory, and
> >>> compute and set a fraction to the operators. See FLIP-53[1] for more
> >>> details, and I would suggest you to double check with @Zhu Zhu who
> works
> >> on
> >>> this part.
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>
> >>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
> wrote:
> >>>
> >>>> Hi Jincheng,
> >>>>
> >>>> Thanks for the reply and also looking forward to the feedback from the
> >>>> community.
> >>>>
> >>>> Thanks,
> >>>> Dian
> >>>>
> >>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> +1， Thanks for bring up this discussion Dian!
> >>>>>
> >>>>> The Resource Management is very important for PyFlink UDF. So, It's
> >> great
> >>>>> if anyone can add more comments or inputs in the design doc or
> >> feedback
> >>>> in
> >>>>> ML. :)
> >>>>>
> >>>>> Best,
> >>>>> Jincheng
> >>>>>
> >>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> In FLIP-58[1] it will add the support of Python user-defined
> >> stateless
> >>>>>> function for Python Table API. It will launch a separate Python
> >> process
> >>>> for
> >>>>>> Python user-defined function execution. The resources used by the
> >> Python
> >>>>>> process should be managed properly by Flink’s resource management
> >>>>>> framework. FLIP-49[2] has proposed a unified memory management
> >> framework
> >>>>>> and PyFlink user-defined function resource management should be
> >> based on
> >>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
> >>>> this. I
> >>>>>> draft a design doc[3] and want to start a discussion about PyFlink
> >>>>>> user-defined function resource management.
> >>>>>>
> >>>>>> Welcome any comments on the design doc or giving us feedback on the
> >> ML
> >>>>>> directly.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Dian
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>>>>> [2]
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >>>>>> [3]
> >>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
> >>>>
> >>>>
> >>
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Posted by Dian Fu <di...@gmail.com>.

Hi Jingsong,

Appreciated for your sharing. It's very helpful as the Python operator will take the similar way.

Thanks,
Dian

> 在 2019年12月6日，上午11:12，Jingsong Li <ji...@gmail.com> 写道：
> 
> Hi Dian,
> 
> After [1] and [2], in the batch sql world, we will:
> - [2] In client/compile side: we use memory weight request memory for
> Transformation.
> - [1] In runtime side: we use memory fraction to compute memory size and
> allocate in StreamOperator.
> For your information.
> 
> [1] https://jira.apache.org/jira/browse/FLINK-14063
> [2] https://jira.apache.org/jira/browse/FLINK-15035
> 
> Best,
> Jingsong Lee
> 
> On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <di...@gmail.com> wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks for your valuable feedback. I have updated the "Example" section
>> describing how to use these options in a Python Table API program.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年12月2日，下午6:12，Jingsong Lee <lz...@apache.org> 写道：
>>> 
>>> Hi Dian:
>>> 
>>> Thanks for you explanation.
>>> If you can update the document to add explanation for the changes to the
>>> table layer,
>>> it might be better. (it's just a suggestion, it depends on you)
>>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
>>> Will this queue take up a lot of memory?
>>> Can it also occupy memory as large as buffer.memory?
>>> If so, what we're dealing with now is the silent use of heap memory?
>>> I feel a little strange, because the memory on the python side will
>> reserve,
>>> but the memory on the JVM side is used silently.
>>> 
>>> After carefully seeing your comments on Google doc:
>>>> The memory used by the Java operator is currently accounted as the task
>>> on-heap memory. We can revisit this if we find it's a problem in the
>> future.
>>> I agree that we can ignore it now, But we can add some content to the
>>> document to remind the user, What do you think?
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <di...@gmail.com> wrote:
>>> 
>>>> Hi Jingsong,
>>>> 
>>>> Thanks a lot for your comments. Please see my reply inlined below.
>>>> 
>>>>> 在 2019年12月2日，下午3:47，Jingsong Lee <lz...@apache.org> 写道：
>>>>> 
>>>>> Hi Dian:
>>>>> 
>>>>> 
>>>>> Thanks for your driving. I have some questions:
>>>>> 
>>>>> 
>>>>> - Where should these configurations belong? You have mentioned
>>>> tableApi/SQL,
>>>>> so should in TableConfig?
>>>> 
>>>> All Python related configurations are defined in PythonOptions. User
>> could
>>>> configure these configurations via TableConfig.getConfiguration.setXXX
>> for
>>>> Python Table API programs.
>>>> 
>>>>> 
>>>>> - If just in table/sql, whether it should be called: table.python.****,
>>>>> because in table, all config options are called table.***.
>>>> 
>>>> These configurations are not table specific. They will be used for both
>>>> Python Table API programs and Python DataStream API programs (which is
>>>> planned to be supported in the future). So python.xxx seems more
>>>> appropriate, what do you think?
>>>> 
>>>>> - What should table module do? So in CommonPythonCalc, we should read
>>>>> options from table config, and set resources to OneInputTransformation?
>>>> 
>>>> As described in the design doc, in compilation phase, for batch jobs,
>> the
>>>> required memory of the Python worker will be calculated according to the
>>>> configuration and set as the managed memory for the operator. For stream
>>>> jobs, the resource spec will be unknown(The reason is that currently the
>>>> resources for all the operators in stream jobs are unknown and it
>> doesn’t
>>>> support to configure both known and unknown resources in a single job).
>>>> 
>>>>> - Are all buffer.memory off-heap memory? I took a look
>>>>> to AbstractPythonScalarFunctionOperator, there is a
>> forwardedInputQueue,
>>>> is
>>>>> this one a heap queue? So we need heap memory too?
>>>> 
>>>> Yes, they are all off-heap memory which is supposed to be used by the
>>>> Python process. The forwardedInputQueue is a buffer used in the Java
>>>> operator and its memory is accounted as the on-heap memory.
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 
>>>>> Hope to get your reply.
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jingsong Lee
>>>>> 
>>>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>>>>>> offline and also on the design doc.
>>>>>> 
>>>>>> It seems that we have reached consensus on the design. I would bring
>> up
>>>>>> the VOTE if there is no other feedbacks.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
>>>>>>> 
>>>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>>>>>> It is great to make sure that the resources used by the Python
>> process
>>>>>> are
>>>>>>> managed properly by Flink’s resource management framework.
>>>>>>> 
>>>>>>> Also, thanks to the guys that working on the unified memory
>> management
>>>>>>> framework.
>>>>>>> 
>>>>>>> Best, Hequn
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>>> Thanks for driving this discussion, Dian!
>>>>>>>> 
>>>>>>>> +1 for this proposal. It will help to reduce container failure due
>> to
>>>>>>>> the memory overuse.
>>>>>>>> Some comments left in the design doc.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Yangze Guo
>>>>>>>> 
>>>>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong820@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Sorry for the late reply.
>>>>>>>>> 
>>>>>>>>> +1 for the general proposal.
>>>>>>>>> 
>>>>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to
>>>> make
>>>>>>>>> sure optimizer knowns which operators use off-heap managed memory,
>>>> and
>>>>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for
>> more
>>>>>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>>>>>> works
>>>>>>>> on
>>>>>>>>> this part.
>>>>>>>>> 
>>>>>>>>> Thank you~
>>>>>>>>> 
>>>>>>>>> Xintong Song
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Jincheng,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the reply and also looking forward to the feedback from
>>>> the
>>>>>>>>>> community.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dian
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
>>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> +1， Thanks for bring up this discussion Dian!
>>>>>>>>>>> 
>>>>>>>>>>> The Resource Management is very important for PyFlink UDF. So,
>> It's
>>>>>>>> great
>>>>>>>>>>> if anyone can add more comments or inputs in the design doc or
>>>>>>>> feedback
>>>>>>>>>> in
>>>>>>>>>>> ML. :)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>> 
>>>>>>>>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>>>>>>>> stateless
>>>>>>>>>>>> function for Python Table API. It will launch a separate Python
>>>>>>>> process
>>>>>>>>>> for
>>>>>>>>>>>> Python user-defined function execution. The resources used by
>> the
>>>>>>>> Python
>>>>>>>>>>>> process should be managed properly by Flink’s resource
>> management
>>>>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>>>>>>>> framework
>>>>>>>>>>>> and PyFlink user-defined function resource management should be
>>>>>>>> based on
>>>>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline
>> about
>>>>>>>>>> this. I
>>>>>>>>>>>> draft a design doc[3] and want to start a discussion about
>> PyFlink
>>>>>>>>>>>> user-defined function resource management.
>>>>>>>>>>>> 
>>>>>>>>>>>> Welcome any comments on the design doc or giving us feedback on
>>>> the
>>>>>>>> ML
>>>>>>>>>>>> directly.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Dian
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>>>>>>>> [3]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best, Jingsong Lee
>>>> 
>>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>> 
>> 
> 
> -- 
> Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Posted by Jingsong Li <ji...@gmail.com>.

Hi Dian,

After [1] and [2], in the batch sql world, we will:
- [2] In client/compile side: we use memory weight request memory for
Transformation.
- [1] In runtime side: we use memory fraction to compute memory size and
allocate in StreamOperator.
For your information.

[1] https://jira.apache.org/jira/browse/FLINK-14063
[2] https://jira.apache.org/jira/browse/FLINK-15035

Best,
Jingsong Lee

On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <di...@gmail.com> wrote:

> Hi Jingsong,
>
> Thanks for your valuable feedback. I have updated the "Example" section
> describing how to use these options in a Python Table API program.
>
> Thanks,
> Dian
>
> > 在 2019年12月2日，下午6:12，Jingsong Lee <lz...@apache.org> 写道：
> >
> > Hi Dian:
> >
> > Thanks for you explanation.
> > If you can update the document to add explanation for the changes to the
> > table layer,
> > it might be better. (it's just a suggestion, it depends on you)
> > About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> > Will this queue take up a lot of memory?
> > Can it also occupy memory as large as buffer.memory?
> > If so, what we're dealing with now is the silent use of heap memory?
> > I feel a little strange, because the memory on the python side will
> reserve,
> > but the memory on the JVM side is used silently.
> >
> > After carefully seeing your comments on Google doc:
> >> The memory used by the Java operator is currently accounted as the task
> > on-heap memory. We can revisit this if we find it's a problem in the
> future.
> > I agree that we can ignore it now, But we can add some content to the
> > document to remind the user, What do you think?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <di...@gmail.com> wrote:
> >
> >> Hi Jingsong,
> >>
> >> Thanks a lot for your comments. Please see my reply inlined below.
> >>
> >>> 在 2019年12月2日，下午3:47，Jingsong Lee <lz...@apache.org> 写道：
> >>>
> >>> Hi Dian:
> >>>
> >>>
> >>> Thanks for your driving. I have some questions:
> >>>
> >>>
> >>> - Where should these configurations belong? You have mentioned
> >> tableApi/SQL,
> >>> so should in TableConfig?
> >>
> >> All Python related configurations are defined in PythonOptions. User
> could
> >> configure these configurations via TableConfig.getConfiguration.setXXX
> for
> >> Python Table API programs.
> >>
> >>>
> >>> - If just in table/sql, whether it should be called: table.python.****,
> >>> because in table, all config options are called table.***.
> >>
> >> These configurations are not table specific. They will be used for both
> >> Python Table API programs and Python DataStream API programs (which is
> >> planned to be supported in the future). So python.xxx seems more
> >> appropriate, what do you think?
> >>
> >>> - What should table module do? So in CommonPythonCalc, we should read
> >>> options from table config, and set resources to OneInputTransformation?
> >>
> >> As described in the design doc, in compilation phase, for batch jobs,
> the
> >> required memory of the Python worker will be calculated according to the
> >> configuration and set as the managed memory for the operator. For stream
> >> jobs, the resource spec will be unknown(The reason is that currently the
> >> resources for all the operators in stream jobs are unknown and it
> doesn’t
> >> support to configure both known and unknown resources in a single job).
> >>
> >>> - Are all buffer.memory off-heap memory? I took a look
> >>> to AbstractPythonScalarFunctionOperator, there is a
> forwardedInputQueue,
> >> is
> >>> this one a heap queue? So we need heap memory too?
> >>
> >> Yes, they are all off-heap memory which is supposed to be used by the
> >> Python process. The forwardedInputQueue is a buffer used in the Java
> >> operator and its memory is accounted as the on-heap memory.
> >>
> >> Regards,
> >> Dian
> >>
> >>>
> >>> Hope to get your reply.
> >>>
> >>>
> >>> Best,
> >>>
> >>> Jingsong Lee
> >>>
> >>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com>
> wrote:
> >>>
> >>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
> >>>> offline and also on the design doc.
> >>>>
> >>>> It seems that we have reached consensus on the design. I would bring
> up
> >>>> the VOTE if there is no other feedbacks.
> >>>>
> >>>> Thanks,
> >>>> Dian
> >>>>
> >>>>> 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
> >>>>>
> >>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> >>>>> It is great to make sure that the resources used by the Python
> process
> >>>> are
> >>>>> managed properly by Flink’s resource management framework.
> >>>>>
> >>>>> Also, thanks to the guys that working on the unified memory
> management
> >>>>> framework.
> >>>>>
> >>>>> Best, Hequn
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Thanks for driving this discussion, Dian!
> >>>>>>
> >>>>>> +1 for this proposal. It will help to reduce container failure due
> to
> >>>>>> the memory overuse.
> >>>>>> Some comments left in the design doc.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yangze Guo
> >>>>>>
> >>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong820@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Sorry for the late reply.
> >>>>>>>
> >>>>>>> +1 for the general proposal.
> >>>>>>>
> >>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to
> >> make
> >>>>>>> sure optimizer knowns which operators use off-heap managed memory,
> >> and
> >>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for
> more
> >>>>>>> details, and I would suggest you to double check with @Zhu Zhu who
> >>>> works
> >>>>>> on
> >>>>>>> this part.
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>
> >>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Jincheng,
> >>>>>>>>
> >>>>>>>> Thanks for the reply and also looking forward to the feedback from
> >> the
> >>>>>>>> community.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Dian
> >>>>>>>>
> >>>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> +1， Thanks for bring up this discussion Dian!
> >>>>>>>>>
> >>>>>>>>> The Resource Management is very important for PyFlink UDF. So,
> It's
> >>>>>> great
> >>>>>>>>> if anyone can add more comments or inputs in the design doc or
> >>>>>> feedback
> >>>>>>>> in
> >>>>>>>>> ML. :)
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jincheng
> >>>>>>>>>
> >>>>>>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
> >>>>>> stateless
> >>>>>>>>>> function for Python Table API. It will launch a separate Python
> >>>>>> process
> >>>>>>>> for
> >>>>>>>>>> Python user-defined function execution. The resources used by
> the
> >>>>>> Python
> >>>>>>>>>> process should be managed properly by Flink’s resource
> management
> >>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
> >>>>>> framework
> >>>>>>>>>> and PyFlink user-defined function resource management should be
> >>>>>> based on
> >>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline
> about
> >>>>>>>> this. I
> >>>>>>>>>> draft a design doc[3] and want to start a discussion about
> PyFlink
> >>>>>>>>>> user-defined function resource management.
> >>>>>>>>>>
> >>>>>>>>>> Welcome any comments on the design doc or giving us feedback on
> >> the
> >>>>>> ML
> >>>>>>>>>> directly.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Dian
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >>>>>>>>>> [3]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>
> >>
> >
> > --
> > Best, Jingsong Lee
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Posted by Dian Fu <di...@gmail.com>.

Hi Jingsong,

Thanks for your valuable feedback. I have updated the "Example" section describing how to use these options in a Python Table API program.

Thanks,
Dian

> 在 2019年12月2日，下午6:12，Jingsong Lee <lz...@apache.org> 写道：
> 
> Hi Dian:
> 
> Thanks for you explanation.
> If you can update the document to add explanation for the changes to the
> table layer,
> it might be better. (it's just a suggestion, it depends on you)
> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> Will this queue take up a lot of memory?
> Can it also occupy memory as large as buffer.memory?
> If so, what we're dealing with now is the silent use of heap memory?
> I feel a little strange, because the memory on the python side will reserve,
> but the memory on the JVM side is used silently.
> 
> After carefully seeing your comments on Google doc:
>> The memory used by the Java operator is currently accounted as the task
> on-heap memory. We can revisit this if we find it's a problem in the future.
> I agree that we can ignore it now, But we can add some content to the
> document to remind the user, What do you think?
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <di...@gmail.com> wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks a lot for your comments. Please see my reply inlined below.
>> 
>>> 在 2019年12月2日，下午3:47，Jingsong Lee <lz...@apache.org> 写道：
>>> 
>>> Hi Dian:
>>> 
>>> 
>>> Thanks for your driving. I have some questions:
>>> 
>>> 
>>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL,
>>> so should in TableConfig?
>> 
>> All Python related configurations are defined in PythonOptions. User could
>> configure these configurations via TableConfig.getConfiguration.setXXX for
>> Python Table API programs.
>> 
>>> 
>>> - If just in table/sql, whether it should be called: table.python.****,
>>> because in table, all config options are called table.***.
>> 
>> These configurations are not table specific. They will be used for both
>> Python Table API programs and Python DataStream API programs (which is
>> planned to be supported in the future). So python.xxx seems more
>> appropriate, what do you think?
>> 
>>> - What should table module do? So in CommonPythonCalc, we should read
>>> options from table config, and set resources to OneInputTransformation?
>> 
>> As described in the design doc, in compilation phase, for batch jobs, the
>> required memory of the Python worker will be calculated according to the
>> configuration and set as the managed memory for the operator. For stream
>> jobs, the resource spec will be unknown(The reason is that currently the
>> resources for all the operators in stream jobs are unknown and it doesn’t
>> support to configure both known and unknown resources in a single job).
>> 
>>> - Are all buffer.memory off-heap memory? I took a look
>>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue,
>> is
>>> this one a heap queue? So we need heap memory too?
>> 
>> Yes, they are all off-heap memory which is supposed to be used by the
>> Python process. The forwardedInputQueue is a buffer used in the Java
>> operator and its memory is accounted as the on-heap memory.
>> 
>> Regards,
>> Dian
>> 
>>> 
>>> Hope to get your reply.
>>> 
>>> 
>>> Best,
>>> 
>>> Jingsong Lee
>>> 
>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com> wrote:
>>> 
>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>>>> offline and also on the design doc.
>>>> 
>>>> It seems that we have reached consensus on the design. I would bring up
>>>> the VOTE if there is no other feedbacks.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
>>>>> 
>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>>>> It is great to make sure that the resources used by the Python process
>>>> are
>>>>> managed properly by Flink’s resource management framework.
>>>>> 
>>>>> Also, thanks to the guys that working on the unified memory management
>>>>> framework.
>>>>> 
>>>>> Best, Hequn
>>>>> 
>>>>> 
>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com> wrote:
>>>>> 
>>>>>> Thanks for driving this discussion, Dian!
>>>>>> 
>>>>>> +1 for this proposal. It will help to reduce container failure due to
>>>>>> the memory overuse.
>>>>>> Some comments left in the design doc.
>>>>>> 
>>>>>> Best,
>>>>>> Yangze Guo
>>>>>> 
>>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <to...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Sorry for the late reply.
>>>>>>> 
>>>>>>> +1 for the general proposal.
>>>>>>> 
>>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to
>> make
>>>>>>> sure optimizer knowns which operators use off-heap managed memory,
>> and
>>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>>>> works
>>>>>> on
>>>>>>> this part.
>>>>>>> 
>>>>>>> Thank you~
>>>>>>> 
>>>>>>> Xintong Song
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>> 
>>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Jincheng,
>>>>>>>> 
>>>>>>>> Thanks for the reply and also looking forward to the feedback from
>> the
>>>>>>>> community.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Dian
>>>>>>>> 
>>>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> +1， Thanks for bring up this discussion Dian!
>>>>>>>>> 
>>>>>>>>> The Resource Management is very important for PyFlink UDF. So, It's
>>>>>> great
>>>>>>>>> if anyone can add more comments or inputs in the design doc or
>>>>>> feedback
>>>>>>>> in
>>>>>>>>> ML. :)
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jincheng
>>>>>>>>> 
>>>>>>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
>>>>>>>>> 
>>>>>>>>>> Hi everyone,
>>>>>>>>>> 
>>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>>>>>> stateless
>>>>>>>>>> function for Python Table API. It will launch a separate Python
>>>>>> process
>>>>>>>> for
>>>>>>>>>> Python user-defined function execution. The resources used by the
>>>>>> Python
>>>>>>>>>> process should be managed properly by Flink’s resource management
>>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>>>>>> framework
>>>>>>>>>> and PyFlink user-defined function resource management should be
>>>>>> based on
>>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
>>>>>>>> this. I
>>>>>>>>>> draft a design doc[3] and want to start a discussion about PyFlink
>>>>>>>>>> user-defined function resource management.
>>>>>>>>>> 
>>>>>>>>>> Welcome any comments on the design doc or giving us feedback on
>> the
>>>>>> ML
>>>>>>>>>> directly.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Dian
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>>>>> [2]
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>>>>>> [3]
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>> 
>> 
> 
> -- 
> Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Posted by Jingsong Lee <lz...@apache.org>.

Hi Dian:

Thanks for you explanation.
If you can update the document to add explanation for the changes to the
table layer,
 it might be better. (it's just a suggestion, it depends on you)

About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
Will this queue take up a lot of memory?
Can it also occupy memory as large as buffer.memory?
If so, what we're dealing with now is the silent use of heap memory?
I feel a little strange, because the memory on the python side will reserve,
but the memory on the JVM side is used silently.

After carefully seeing your comments on Google doc:
> The memory used by the Java operator is currently accounted as the task
on-heap memory. We can revisit this if we find it's a problem in the future.
I agree that we can ignore it now, But we can add some content to the
document to remind the user, What do you think?

Best,
Jingsong Lee

On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <di...@gmail.com> wrote:

> Hi Jingsong,
>
> Thanks a lot for your comments. Please see my reply inlined below.
>
> > 在 2019年12月2日，下午3:47，Jingsong Lee <lz...@apache.org> 写道：
> >
> > Hi Dian:
> >
> >
> > Thanks for your driving. I have some questions:
> >
> >
> > - Where should these configurations belong? You have mentioned
> tableApi/SQL,
> > so should in TableConfig?
>
> All Python related configurations are defined in PythonOptions. User could
> configure these configurations via TableConfig.getConfiguration.setXXX for
> Python Table API programs.
>
> >
> > - If just in table/sql, whether it should be called: table.python.****,
> > because in table, all config options are called table.***.
>
> These configurations are not table specific. They will be used for both
> Python Table API programs and Python DataStream API programs (which is
> planned to be supported in the future). So python.xxx seems more
> appropriate, what do you think?
>
> > - What should table module do? So in CommonPythonCalc, we should read
> > options from table config, and set resources to OneInputTransformation?
>
> As described in the design doc, in compilation phase, for batch jobs, the
> required memory of the Python worker will be calculated according to the
> configuration and set as the managed memory for the operator. For stream
> jobs, the resource spec will be unknown(The reason is that currently the
> resources for all the operators in stream jobs are unknown and it doesn’t
> support to configure both known and unknown resources in a single job).
>
> > - Are all buffer.memory off-heap memory? I took a look
> > to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue,
> is
> > this one a heap queue? So we need heap memory too?
>
> Yes, they are all off-heap memory which is supposed to be used by the
> Python process. The forwardedInputQueue is a buffer used in the Java
> operator and its memory is accounted as the on-heap memory.
>
> Regards,
> Dian
>
> >
> > Hope to get your reply.
> >
> >
> > Best,
> >
> > Jingsong Lee
> >
> > On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com> wrote:
> >
> >> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
> >> offline and also on the design doc.
> >>
> >> It seems that we have reached consensus on the design. I would bring up
> >> the VOTE if there is no other feedbacks.
> >>
> >> Thanks,
> >> Dian
> >>
> >>> 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
> >>>
> >>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> >>> It is great to make sure that the resources used by the Python process
> >> are
> >>> managed properly by Flink’s resource management framework.
> >>>
> >>> Also, thanks to the guys that working on the unified memory management
> >>> framework.
> >>>
> >>> Best, Hequn
> >>>
> >>>
> >>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com> wrote:
> >>>
> >>>> Thanks for driving this discussion, Dian!
> >>>>
> >>>> +1 for this proposal. It will help to reduce container failure due to
> >>>> the memory overuse.
> >>>> Some comments left in the design doc.
> >>>>
> >>>> Best,
> >>>> Yangze Guo
> >>>>
> >>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <to...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Sorry for the late reply.
> >>>>>
> >>>>> +1 for the general proposal.
> >>>>>
> >>>>> And one remainder, to use UNKNOWN resource requirement, we need to
> make
> >>>>> sure optimizer knowns which operators use off-heap managed memory,
> and
> >>>>> compute and set a fraction to the operators. See FLIP-53[1] for more
> >>>>> details, and I would suggest you to double check with @Zhu Zhu who
> >> works
> >>>> on
> >>>>> this part.
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>
> >>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi Jincheng,
> >>>>>>
> >>>>>> Thanks for the reply and also looking forward to the feedback from
> the
> >>>>>> community.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Dian
> >>>>>>
> >>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> +1， Thanks for bring up this discussion Dian!
> >>>>>>>
> >>>>>>> The Resource Management is very important for PyFlink UDF. So, It's
> >>>> great
> >>>>>>> if anyone can add more comments or inputs in the design doc or
> >>>> feedback
> >>>>>> in
> >>>>>>> ML. :)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jincheng
> >>>>>>>
> >>>>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
> >>>> stateless
> >>>>>>>> function for Python Table API. It will launch a separate Python
> >>>> process
> >>>>>> for
> >>>>>>>> Python user-defined function execution. The resources used by the
> >>>> Python
> >>>>>>>> process should be managed properly by Flink’s resource management
> >>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
> >>>> framework
> >>>>>>>> and PyFlink user-defined function resource management should be
> >>>> based on
> >>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
> >>>>>> this. I
> >>>>>>>> draft a design doc[3] and want to start a discussion about PyFlink
> >>>>>>>> user-defined function resource management.
> >>>>>>>>
> >>>>>>>> Welcome any comments on the design doc or giving us feedback on
> the
> >>>> ML
> >>>>>>>> directly.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Dian
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> >>>>>>>> [3]
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
> >>>>>>
> >>>>>>
> >>>>
> >>
> >>
> >
> > --
> > Best, Jingsong Lee
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Posted by Dian Fu <di...@gmail.com>.

Hi Jingsong,

Thanks a lot for your comments. Please see my reply inlined below.

> 在 2019年12月2日，下午3:47，Jingsong Lee <lz...@apache.org> 写道：
> 
> Hi Dian:
> 
> 
> Thanks for your driving. I have some questions:
> 
> 
> - Where should these configurations belong? You have mentioned tableApi/SQL,
> so should in TableConfig?

All Python related configurations are defined in PythonOptions. User could configure these configurations via TableConfig.getConfiguration.setXXX for Python Table API programs.

> 
> - If just in table/sql, whether it should be called: table.python.****,
> because in table, all config options are called table.***.

These configurations are not table specific. They will be used for both Python Table API programs and Python DataStream API programs (which is planned to be supported in the future). So python.xxx seems more appropriate, what do you think?

> - What should table module do? So in CommonPythonCalc, we should read
> options from table config, and set resources to OneInputTransformation?

As described in the design doc, in compilation phase, for batch jobs, the required memory of the Python worker will be calculated according to the configuration and set as the managed memory for the operator. For stream jobs, the resource spec will be unknown(The reason is that currently the resources for all the operators in stream jobs are unknown and it doesn’t support to configure both known and unknown resources in a single job).

> - Are all buffer.memory off-heap memory? I took a look
> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
> this one a heap queue? So we need heap memory too?

Yes, they are all off-heap memory which is supposed to be used by the Python process. The forwardedInputQueue is a buffer used in the Java operator and its memory is accounted as the on-heap memory.

Regards,
Dian

> 
> Hope to get your reply.
> 
> 
> Best,
> 
> Jingsong Lee
> 
> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <di...@gmail.com> wrote:
> 
>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>> offline and also on the design doc.
>> 
>> It seems that we have reached consensus on the design. I would bring up
>> the VOTE if there is no other feedbacks.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年11月22日，下午2:51，Hequn Cheng <ch...@gmail.com> 写道：
>>> 
>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>> It is great to make sure that the resources used by the Python process
>> are
>>> managed properly by Flink’s resource management framework.
>>> 
>>> Also, thanks to the guys that working on the unified memory management
>>> framework.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <ka...@gmail.com> wrote:
>>> 
>>>> Thanks for driving this discussion, Dian!
>>>> 
>>>> +1 for this proposal. It will help to reduce container failure due to
>>>> the memory overuse.
>>>> Some comments left in the design doc.
>>>> 
>>>> Best,
>>>> Yangze Guo
>>>> 
>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <to...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Sorry for the late reply.
>>>>> 
>>>>> +1 for the general proposal.
>>>>> 
>>>>> And one remainder, to use UNKNOWN resource requirement, we need to make
>>>>> sure optimizer knowns which operators use off-heap managed memory, and
>>>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>> works
>>>> on
>>>>> this part.
>>>>> 
>>>>> Thank you~
>>>>> 
>>>>> Xintong Song
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>> 
>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <di...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi Jincheng,
>>>>>> 
>>>>>> Thanks for the reply and also looking forward to the feedback from the
>>>>>> community.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <su...@gmail.com> 写道：
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> +1， Thanks for bring up this discussion Dian!
>>>>>>> 
>>>>>>> The Resource Management is very important for PyFlink UDF. So, It's
>>>> great
>>>>>>> if anyone can add more comments or inputs in the design doc or
>>>> feedback
>>>>>> in
>>>>>>> ML. :)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>> 
>>>>>>> Dian Fu <di...@gmail.com> 于2019年11月5日周二 上午11:32写道：
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>>>> stateless
>>>>>>>> function for Python Table API. It will launch a separate Python
>>>> process
>>>>>> for
>>>>>>>> Python user-defined function execution. The resources used by the
>>>> Python
>>>>>>>> process should be managed properly by Flink’s resource management
>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>>>> framework
>>>>>>>> and PyFlink user-defined function resource management should be
>>>> based on
>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
>>>>>> this. I
>>>>>>>> draft a design doc[3] and want to start a discussion about PyFlink
>>>>>>>> user-defined function resource management.
>>>>>>>> 
>>>>>>>> Welcome any comments on the design doc or giving us feedback on the
>>>> ML
>>>>>>>> directly.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Dian
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>>> [2]
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>>>> [3]
>>>>>>>> 
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>>>> 
>>>>>> 
>>>> 
>> 
>> 
> 
> -- 
> Best, Jingsong Lee