You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by vtygoss <vt...@126.com> on 2022/03/04 06:36:28 UTC

Pyflink1.13 or JavaFlink1.13 + Jpython + Python2.7, which way has better performance?

Hi, community!


I am working on data processing structure optimization from full data pipeline to incremental data pipeline, from PySpark with PythonCode to two optional ways below: 


1. PyFlink 1.13 + Python 2.7
2. JavaFlink 1.13 + JPython + Python 2.7 


As far as i know, the python APIs only provide a subset of about 2/3 of what's available in Java APIs; the performance of PyFlink is worse than JavaFlink and some features contributed after 1.10 are not implemented in PyFlink yet. 


And python code can be compiled to java bytecode by ASM carrier and loaded into JVM, so can i argue that the python code is not much less efficient than java code? 


So i prefer the second way. 
Thanks for any suggestions or replies. 


Best Regards!

Re: Pyflink1.13 or JavaFlink1.13 + Jpython + Python2.7, which way has better performance?

Posted by Dian Fu <di...@gmail.com>.
Hi Vtygoss,

>> As far as i know, the python APIs only provide a subset of about 2/3 of
what's available in Java APIs; the performance of PyFlink is worse than
JavaFlink and some features contributed after 1.10 are not implemented in
PyFlink yet.
There are two levels of API in Flink: Table API and DataStream API.
Regarding Table API, AFAIK, it has aligned most of the functionalities
provided in the Java Table API. Regarding DataStream API, there are still
several features not aligned, e.g. side output, broadcast state, join, etc.
However, I guess that most commonly used features should have already been
supported. Do you have a clear understanding of the features you want to
use?  If so, we could evaluate if there are problems.

Besides, regarding "some features contributed after 1.10 are not
implemented in PyFlink yet", usually we are trying to avoid this as much as
possible. Could you share the missing features in your mind? It could help
us to improve this.

>> And python code can be compiled to java bytecode by ASM carrier and
loaded into JVM, so can i argue that the python code is not much less
efficient than java code?
For the JPython solution, there are a few known limitations. If it's not
much of a problem for you, I think you could give it a try.

Regards,
Dian


On Fri, Mar 4, 2022 at 2:36 PM vtygoss <vt...@126.com> wrote:

> Hi, community!
>
>
> I am working on data processing structure optimization from full data
> pipeline to incremental data pipeline, from PySpark with PythonCode to two
> optional ways below:
>
>
> 1. PyFlink 1.13 + Python 2.7
>
> 2. JavaFlink 1.13 + JPython + Python 2.7
>
>
> As far as i know, the python APIs only provide a subset of about 2/3 of
> what's available in Java APIs; the performance of PyFlink is worse than
> JavaFlink and some features contributed after 1.10 are not implemented in
> PyFlink yet.
>
>
> And python code can be compiled to java bytecode by ASM carrier and loaded
> into JVM, so can i argue that the python code is not much less efficient
> than java code?
>
>
> So i prefer the second way.
>
> Thanks for any suggestions or replies.
>
>
> Best Regards!
>