You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Wei Zhong <we...@gmail.com> on 2020/08/26 03:27:43 UTC

[DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Hi everyone,

I would like to start discussion about how to support General Python User-Defined Aggregate Function on Table API.

FLIP-58[1] has already introduced the stateless Python UDF and has already been supported in the previous releases. However the stateful Python UDF, i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We want to introduce the general Python user-defined aggregate function for PyFlink Table API.

Here is the design doc:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API

Looking forward to your feedback!

Best,
Wei

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table


Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by Wei Zhong <we...@gmail.com>.
Hi everyone,

Are there more comments about this FLIP? If not, I would like to bring up the VOTE.

Best,
Wei

> 在 2020年9月1日,11:15,Wei Zhong <we...@gmail.com> 写道:
> 
> Hi Timo,
> 
> Thanks for your notification. I’ll remove it from the design doc.
> 
> Best,
> Wei
> 
>> 在 2020年8月31日,21:11,Timo Walther <tw...@apache.org> 写道:
>> 
>> Hi Wei,
>> 
>> is `reset_accumulator` still necessary? We dropped it recently in the Java API because it was not used anymore by the planner.
>> 
>> Regards,
>> Timo
>> 
>> On 31.08.20 15:00, Wei Zhong wrote:
>>> Hi Jincheng & Xingbo,
>>> Thanks for your suggestions.
>>> I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.
>>> Best,
>>> Wei
>>>> 在 2020年8月31日,18:06,Xingbo Huang <hx...@gmail.com> 写道:
>>>> 
>>>> Hi Wei,
>>>> 
>>>> Thanks a lot for the discussion.
>>>> 
>>>> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
>>>> together.
>>>> 
>>>> One question is whether we can use @udaf which is introduced in FLIP-137[1]
>>>> to describe pandas udaf and general python udaf together. From the overall
>>>> view of Python User Defined Function, we use @udf to describe general
>>>> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
>>>> describe general python udaf and pandas udaf, which is more unified.
>>>> 
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>>>> 
>>>> Best,
>>>> Xingbo
>>>> 
>>>> jincheng sun <su...@gmail.com> 于2020年8月31日周一 上午11:11写道:
>>>> 
>>>>> Hi Wei,
>>>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>>> 
>>>>> One question is, can the @udaf added by flip-137 be used in General Python
>>>>> UDAF?
>>>>> Would be gread if we can consider it combination with FLIP-137 for design.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I would like to start discussion about how to support General Python
>>>>>> User-Defined Aggregate Function on Table API.
>>>>>> 
>>>>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>>>>> already
>>>>>> been supported in the previous releases. However the stateful Python UDF,
>>>>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>>>>> want to introduce the general Python user-defined aggregate function for
>>>>>> PyFlink Table API.
>>>>>> 
>>>>>> Here is the design doc:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>>>> 
>>>>>> Looking forward to your feedback!
>>>>>> 
>>>>>> Best,
>>>>>> Wei
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>> 
>>>>>> 
>>>>> 
>> 
> 


Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by Wei Zhong <we...@gmail.com>.
Hi Timo,

Thanks for your notification. I’ll remove it from the design doc.

Best,
Wei

> 在 2020年8月31日,21:11,Timo Walther <tw...@apache.org> 写道:
> 
> Hi Wei,
> 
> is `reset_accumulator` still necessary? We dropped it recently in the Java API because it was not used anymore by the planner.
> 
> Regards,
> Timo
> 
> On 31.08.20 15:00, Wei Zhong wrote:
>> Hi Jincheng & Xingbo,
>> Thanks for your suggestions.
>> I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.
>> Best,
>> Wei
>>> 在 2020年8月31日,18:06,Xingbo Huang <hx...@gmail.com> 写道:
>>> 
>>> Hi Wei,
>>> 
>>> Thanks a lot for the discussion.
>>> 
>>> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
>>> together.
>>> 
>>> One question is whether we can use @udaf which is introduced in FLIP-137[1]
>>> to describe pandas udaf and general python udaf together. From the overall
>>> view of Python User Defined Function, we use @udf to describe general
>>> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
>>> describe general python udaf and pandas udaf, which is more unified.
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>>> 
>>> Best,
>>> Xingbo
>>> 
>>> jincheng sun <su...@gmail.com> 于2020年8月31日周一 上午11:11写道:
>>> 
>>>> Hi Wei,
>>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>> 
>>>> One question is, can the @udaf added by flip-137 be used in General Python
>>>> UDAF?
>>>> Would be gread if we can consider it combination with FLIP-137 for design.
>>>> 
>>>> What do you think?
>>>> 
>>>> Best,
>>>> Jincheng
>>>> 
>>>> Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I would like to start discussion about how to support General Python
>>>>> User-Defined Aggregate Function on Table API.
>>>>> 
>>>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>>>> already
>>>>> been supported in the previous releases. However the stateful Python UDF,
>>>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>>>> want to introduce the general Python user-defined aggregate function for
>>>>> PyFlink Table API.
>>>>> 
>>>>> Here is the design doc:
>>>>> 
>>>>> 
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>>> 
>>>>> Looking forward to your feedback!
>>>>> 
>>>>> Best,
>>>>> Wei
>>>>> 
>>>>> [1]
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>> 
>>>>> 
>>>> 
> 


Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by Timo Walther <tw...@apache.org>.
Hi Wei,

is `reset_accumulator` still necessary? We dropped it recently in the 
Java API because it was not used anymore by the planner.

Regards,
Timo

On 31.08.20 15:00, Wei Zhong wrote:
> Hi Jincheng & Xingbo,
> 
> Thanks for your suggestions.
> 
> I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.
> 
> Best,
> Wei
> 
> 
>> 在 2020年8月31日,18:06,Xingbo Huang <hx...@gmail.com> 写道:
>>
>> Hi Wei,
>>
>> Thanks a lot for the discussion.
>>
>> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
>> together.
>>
>> One question is whether we can use @udaf which is introduced in FLIP-137[1]
>> to describe pandas udaf and general python udaf together. From the overall
>> view of Python User Defined Function, we use @udf to describe general
>> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
>> describe general python udaf and pandas udaf, which is more unified.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>>
>> Best,
>> Xingbo
>>
>> jincheng sun <su...@gmail.com> 于2020年8月31日周一 上午11:11写道:
>>
>>> Hi Wei,
>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>
>>> One question is, can the @udaf added by flip-137 be used in General Python
>>> UDAF?
>>> Would be gread if we can consider it combination with FLIP-137 for design.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Jincheng
>>>
>>> Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:
>>>
>>>> Hi everyone,
>>>>
>>>> I would like to start discussion about how to support General Python
>>>> User-Defined Aggregate Function on Table API.
>>>>
>>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>>> already
>>>> been supported in the previous releases. However the stateful Python UDF,
>>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>>> want to introduce the general Python user-defined aggregate function for
>>>> PyFlink Table API.
>>>>
>>>> Here is the design doc:
>>>>
>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>>
>>>> Looking forward to your feedback!
>>>>
>>>> Best,
>>>> Wei
>>>>
>>>> [1]
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>
>>>>
>>>
> 


Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by Wei Zhong <we...@gmail.com>.
Hi Jincheng & Xingbo,

Thanks for your suggestions. 

I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.

Best,
Wei


> 在 2020年8月31日,18:06,Xingbo Huang <hx...@gmail.com> 写道:
> 
> Hi Wei,
> 
> Thanks a lot for the discussion.
> 
> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
> together.
> 
> One question is whether we can use @udaf which is introduced in FLIP-137[1]
> to describe pandas udaf and general python udaf together. From the overall
> view of Python User Defined Function, we use @udf to describe general
> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
> describe general python udaf and pandas udaf, which is more unified.
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
> 
> Best,
> Xingbo
> 
> jincheng sun <su...@gmail.com> 于2020年8月31日周一 上午11:11写道:
> 
>> Hi Wei,
>> Thanks for the discussion! Overall, + 1 for this FLIP.
>> 
>> One question is, can the @udaf added by flip-137 be used in General Python
>> UDAF?
>> Would be gread if we can consider it combination with FLIP-137 for design.
>> 
>> What do you think?
>> 
>> Best,
>> Jincheng
>> 
>> Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:
>> 
>>> Hi everyone,
>>> 
>>> I would like to start discussion about how to support General Python
>>> User-Defined Aggregate Function on Table API.
>>> 
>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>> already
>>> been supported in the previous releases. However the stateful Python UDF,
>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>> want to introduce the general Python user-defined aggregate function for
>>> PyFlink Table API.
>>> 
>>> Here is the design doc:
>>> 
>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>> 
>>> Looking forward to your feedback!
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>> 
>>> 
>> 


Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by Xingbo Huang <hx...@gmail.com>.
Hi Wei,

Thanks a lot for the discussion.

Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
together.

One question is whether we can use @udaf which is introduced in FLIP-137[1]
to describe pandas udaf and general python udaf together. From the overall
view of Python User Defined Function, we use @udf to describe general
python udf and pandas udf, @udtf to describe python udtf, and @udaf to
describe general python udaf and pandas udaf, which is more unified.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces

Best,
Xingbo

jincheng sun <su...@gmail.com> 于2020年8月31日周一 上午11:11写道:

> Hi Wei,
> Thanks for the discussion! Overall, + 1 for this FLIP.
>
> One question is, can the @udaf added by flip-137 be used in General Python
> UDAF?
> Would be gread if we can consider it combination with FLIP-137 for design.
>
> What do you think?
>
> Best,
> Jincheng
>
> Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:
>
> > Hi everyone,
> >
> > I would like to start discussion about how to support General Python
> > User-Defined Aggregate Function on Table API.
> >
> > FLIP-58[1] has already introduced the stateless Python UDF and has
> already
> > been supported in the previous releases. However the stateful Python UDF,
> > i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
> > want to introduce the general Python user-defined aggregate function for
> > PyFlink Table API.
> >
> > Here is the design doc:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
> >
> > Looking forward to your feedback!
> >
> > Best,
> > Wei
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >
> >
>

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Posted by jincheng sun <su...@gmail.com>.
Hi Wei,
Thanks for the discussion! Overall, + 1 for this FLIP.

One question is, can the @udaf added by flip-137 be used in General Python
UDAF?
Would be gread if we can consider it combination with FLIP-137 for design.

What do you think?

Best,
Jincheng

Wei Zhong <we...@gmail.com> 于2020年8月26日周三 上午11:28写道:

> Hi everyone,
>
> I would like to start discussion about how to support General Python
> User-Defined Aggregate Function on Table API.
>
> FLIP-58[1] has already introduced the stateless Python UDF and has already
> been supported in the previous releases. However the stateful Python UDF,
> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
> want to introduce the general Python user-defined aggregate function for
> PyFlink Table API.
>
> Here is the design doc:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
>