You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Xingbo Huang <hx...@gmail.com> on 2020/03/31 11:22:21 UTC

[DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Hi everyone,

I would like to start a discussion thread on "Support Cython Optimizing
Python User Defined Function"

Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
Python UDTF will be supported in the coming release of 1.11. In release
1.10, we focused on supporting UDF features and did not make many
optimizations in terms of performance. Although we have made a lot of
optimizations in master[2], Cython can further greatly improve the
performance of Python UDF.

Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
the FLIP-121[3]. It includes the following items:

- Introduces Cython implementation of coder and operations

- Doc changes for building sdist and wheel packages from source code

- Solutions for packages building


Looking forward to your feedback!

Best,

Xingbo

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table

[2] https://issues.apache.org/jira/browse/FLINK-16747

[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Posted by Hequn Cheng <he...@apache.org>.
Hi,

+1 on integrating with Azure, it is consistent with the long term goal and
we are also going to switch from Travis to Azure.
The performance improvement is very impressive. Looking forward to the vote.

Best, Hequn

On Tue, Apr 7, 2020 at 9:10 PM Xingbo Huang <hx...@gmail.com> wrote:

> Hi everyone,
>
> Thanks all of you for the discussion.
> If there are no objections, I would like to start a vote thread tomorrow.
>
> Best,
> Xingbo
>
> jincheng sun <su...@gmail.com> 于2020年4月7日周二 下午6:22写道:
>
> > Hi Xingbo,
> >
> > Thanks for bring up this discussion!
> >
> > I agree with Robert, +1 for integration with Azure.
> >
> > Best,
> > Jincheng
> >
> > Dian Fu <di...@gmail.com> 于2020年4月7日周二 下午2:21写道:
> >
> > > Hi Xingbo,
> > >
> > > Thanks a lot for the great work. Big +1 to this feature. The
> performance
> > > improvement is impressive.
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2020年4月7日,下午12:38,Robert Metzger <rm...@apache.org> 写道:
> > > >
> > > > Thank you for posting the FLIP.
> > > >
> > > > The proposed integration with Azure Pipelines looks good to me.
> > > >
> > > > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <hx...@gmail.com>
> > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I would like to start a discussion thread on "Support Cython
> > Optimizing
> > > >> Python User Defined Function"
> > > >>
> > > >> Scalar Python UDF FLIP-58[1] has already been supported in release
> > 1.10
> > > and
> > > >> Python UDTF will be supported in the coming release of 1.11. In
> > release
> > > >> 1.10, we focused on supporting UDF features and did not make many
> > > >> optimizations in terms of performance. Although we have made a lot
> of
> > > >> optimizations in master[2], Cython can further greatly improve the
> > > >> performance of Python UDF.
> > > >>
> > > >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> > > drafted
> > > >> the FLIP-121[3]. It includes the following items:
> > > >>
> > > >> - Introduces Cython implementation of coder and operations
> > > >>
> > > >> - Doc changes for building sdist and wheel packages from source code
> > > >>
> > > >> - Solutions for packages building
> > > >>
> > > >>
> > > >> Looking forward to your feedback!
> > > >>
> > > >> Best,
> > > >>
> > > >> Xingbo
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > > >>
> > > >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> > > >>
> > > >> [3]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Posted by Xingbo Huang <hx...@gmail.com>.
Hi everyone,

Thanks all of you for the discussion.
If there are no objections, I would like to start a vote thread tomorrow.

Best,
Xingbo

jincheng sun <su...@gmail.com> 于2020年4月7日周二 下午6:22写道:

> Hi Xingbo,
>
> Thanks for bring up this discussion!
>
> I agree with Robert, +1 for integration with Azure.
>
> Best,
> Jincheng
>
> Dian Fu <di...@gmail.com> 于2020年4月7日周二 下午2:21写道:
>
> > Hi Xingbo,
> >
> > Thanks a lot for the great work. Big +1 to this feature. The performance
> > improvement is impressive.
> >
> > Regards,
> > Dian
> >
> > > 在 2020年4月7日,下午12:38,Robert Metzger <rm...@apache.org> 写道:
> > >
> > > Thank you for posting the FLIP.
> > >
> > > The proposed integration with Azure Pipelines looks good to me.
> > >
> > > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <hx...@gmail.com>
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to start a discussion thread on "Support Cython
> Optimizing
> > >> Python User Defined Function"
> > >>
> > >> Scalar Python UDF FLIP-58[1] has already been supported in release
> 1.10
> > and
> > >> Python UDTF will be supported in the coming release of 1.11. In
> release
> > >> 1.10, we focused on supporting UDF features and did not make many
> > >> optimizations in terms of performance. Although we have made a lot of
> > >> optimizations in master[2], Cython can further greatly improve the
> > >> performance of Python UDF.
> > >>
> > >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> > drafted
> > >> the FLIP-121[3]. It includes the following items:
> > >>
> > >> - Introduces Cython implementation of coder and operations
> > >>
> > >> - Doc changes for building sdist and wheel packages from source code
> > >>
> > >> - Solutions for packages building
> > >>
> > >>
> > >> Looking forward to your feedback!
> > >>
> > >> Best,
> > >>
> > >> Xingbo
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > >>
> > >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> > >>
> > >> [3]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> > >>
> >
> >
>

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Posted by jincheng sun <su...@gmail.com>.
Hi Xingbo,

Thanks for bring up this discussion!

I agree with Robert, +1 for integration with Azure.

Best,
Jincheng

Dian Fu <di...@gmail.com> 于2020年4月7日周二 下午2:21写道:

> Hi Xingbo,
>
> Thanks a lot for the great work. Big +1 to this feature. The performance
> improvement is impressive.
>
> Regards,
> Dian
>
> > 在 2020年4月7日,下午12:38,Robert Metzger <rm...@apache.org> 写道:
> >
> > Thank you for posting the FLIP.
> >
> > The proposed integration with Azure Pipelines looks good to me.
> >
> > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <hx...@gmail.com> wrote:
> >
> >> Hi everyone,
> >>
> >> I would like to start a discussion thread on "Support Cython Optimizing
> >> Python User Defined Function"
> >>
> >> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10
> and
> >> Python UDTF will be supported in the coming release of 1.11. In release
> >> 1.10, we focused on supporting UDF features and did not make many
> >> optimizations in terms of performance. Although we have made a lot of
> >> optimizations in master[2], Cython can further greatly improve the
> >> performance of Python UDF.
> >>
> >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> drafted
> >> the FLIP-121[3]. It includes the following items:
> >>
> >> - Introduces Cython implementation of coder and operations
> >>
> >> - Doc changes for building sdist and wheel packages from source code
> >>
> >> - Solutions for packages building
> >>
> >>
> >> Looking forward to your feedback!
> >>
> >> Best,
> >>
> >> Xingbo
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>
> >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> >>
> >> [3]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> >>
>
>

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Posted by Dian Fu <di...@gmail.com>.
Hi Xingbo,

Thanks a lot for the great work. Big +1 to this feature. The performance improvement is impressive.

Regards,
Dian

> 在 2020年4月7日,下午12:38,Robert Metzger <rm...@apache.org> 写道:
> 
> Thank you for posting the FLIP.
> 
> The proposed integration with Azure Pipelines looks good to me.
> 
> On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <hx...@gmail.com> wrote:
> 
>> Hi everyone,
>> 
>> I would like to start a discussion thread on "Support Cython Optimizing
>> Python User Defined Function"
>> 
>> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
>> Python UDTF will be supported in the coming release of 1.11. In release
>> 1.10, we focused on supporting UDF features and did not make many
>> optimizations in terms of performance. Although we have made a lot of
>> optimizations in master[2], Cython can further greatly improve the
>> performance of Python UDF.
>> 
>> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
>> the FLIP-121[3]. It includes the following items:
>> 
>> - Introduces Cython implementation of coder and operations
>> 
>> - Doc changes for building sdist and wheel packages from source code
>> 
>> - Solutions for packages building
>> 
>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> 
>> Xingbo
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> 
>> [2] https://issues.apache.org/jira/browse/FLINK-16747
>> 
>> [3]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>> 


Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Posted by Robert Metzger <rm...@apache.org>.
Thank you for posting the FLIP.

The proposed integration with Azure Pipelines looks good to me.

On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <hx...@gmail.com> wrote:

> Hi everyone,
>
> I would like to start a discussion thread on "Support Cython Optimizing
> Python User Defined Function"
>
> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
> Python UDTF will be supported in the coming release of 1.11. In release
> 1.10, we focused on supporting UDF features and did not make many
> optimizations in terms of performance. Although we have made a lot of
> optimizations in master[2], Cython can further greatly improve the
> performance of Python UDF.
>
> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
> the FLIP-121[3]. It includes the following items:
>
> - Introduces Cython implementation of coder and operations
>
> - Doc changes for building sdist and wheel packages from source code
>
> - Solutions for packages building
>
>
> Looking forward to your feedback!
>
> Best,
>
> Xingbo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
> [2] https://issues.apache.org/jira/browse/FLINK-16747
>
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>