You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Paul Lam <pa...@gmail.com> on 2021/12/14 09:43:55 UTC

Java dependencies management in Pyflink

Hi!

I’m trying out PyFlink and looking for the best practice to manage Java dependencies. 

The docs recommends to use ‘pipeline-jars’ configuration or command line options to specify jars for a PyFlink job. However, PyFlink users may not know what Java dependencies is required. For example, a user may import Kafka connector without knowing Kafka client need to be added to the classpaths. I think the problem here is the lack of a cross-language dependencies management, so we have to do it manually.

Now I workaround the problem by providing a tool that extracts the required jars of the corresponding Java artifact of the imported PyFlink modules via maven dependency plugin. But I wonder if there is some best practice to address the problem? Thanks a lot!

Best,
Paul Lam

Re: Java dependencies management in Pyflink

Posted by Paul Lam <pa...@gmail.com>.

Hi Dian,

Thanks a lot for your input. That’s a valid solution. We avoid using fat jars in Java API, because it easily leads to class conflicts. But PyFlink is like SQL API, user-imported Java dependencies are comparatively rare, so fat jar is a proper choice. 

Best,
Paul Lam

> 2021年12月14日 19:26，Dian Fu <di...@gmail.com> 写道：
> 
> Hi Paul,
> 
> For connectors(including Kafka), it's recommended to use the fat jar which contains the dependencies. For example, for kafka, you could use https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.14.0/flink-sql-connector-kafka_2.11-1.14.0.jar <https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.14.0/flink-sql-connector-kafka_2.11-1.14.0.jar>
> 
> Regards,
> Dian
> 
> On Tue, Dec 14, 2021 at 5:44 PM Paul Lam <paullin3280@gmail.com <ma...@gmail.com>> wrote:
> Hi!
> 
> I’m trying out PyFlink and looking for the best practice to manage Java dependencies. 
> 
> The docs recommends to use ‘pipeline-jars’ configuration or command line options to specify jars for a PyFlink job. However, PyFlink users may not know what Java dependencies is required. For example, a user may import Kafka connector without knowing Kafka client need to be added to the classpaths. I think the problem here is the lack of a cross-language dependencies management, so we have to do it manually.
> 
> Now I workaround the problem by providing a tool that extracts the required jars of the corresponding Java artifact of the imported PyFlink modules via maven dependency plugin. But I wonder if there is some best practice to address the problem? Thanks a lot!
> 
> Best,
> Paul Lam
>

Re: Java dependencies management in Pyflink

Posted by Dian Fu <di...@gmail.com>.

Hi Paul,

For connectors(including Kafka), it's recommended to use the fat jar which
contains the dependencies. For example, for kafka, you could use
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.14.0/flink-sql-connector-kafka_2.11-1.14.0.jar

Regards,
Dian

On Tue, Dec 14, 2021 at 5:44 PM Paul Lam <pa...@gmail.com> wrote:

> Hi!
>
> I’m trying out PyFlink and looking for the best practice to manage Java
> dependencies.
>
> The docs recommends to use ‘pipeline-jars’ configuration or command line
> options to specify jars for a PyFlink job. However, PyFlink users may not
> know what Java dependencies is required. For example, a user may import
> Kafka connector without knowing Kafka client need to be added to the
> classpaths. I think the problem here is the lack of a cross-language
> dependencies management, so we have to do it manually.
>
> Now I workaround the problem by providing a tool that extracts the
> required jars of the corresponding Java artifact of the imported PyFlink
> modules via maven dependency plugin. But I wonder if there is some best
> practice to address the problem? Thanks a lot!
>
> Best,
> Paul Lam
>
>