You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Leon Xu <lx...@attentivemobile.com> on 2022/06/04 19:21:04 UTC

Questions regarding classpath loading order in YarnClusterDescriptor

Hi Flink Community,

We are building on top of  *org.apache.flink.yarn.YarnClusterDescriptor *to
submit a flink application from Java code to YARN cluster, in the
application mode. We are setting the classpath as the value of *the
yarn.provided.lib.dirs
*property under the yarn configuration.

By playing with the YarnClusterDescriptor code I have two questions that I
hope to get some answers:
1. YarnClusterDescriptor seems to force the classpath loading in
alphabetical order. See code here
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>.
Is there any specific reason for doing that? If I'd like to enforce my own
order is it possible now?
2. Looks like the *flink-dist.jar* is treated separately from the other
classpath classes. In the *YarnApplicationFileUploader* class,
the registerMultipleLocalResources method will skip the jar if it is a dist
jar. See the code here
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>.
With the current behavior it seems it will always place the flink-dist.jar
at the end of the classpath. Is there any reason that Flink wants to treat
the *flink-dist.jar* separately from other jars?

In our classpath loading we are hoping to enforce certain order because
different jars may contain the same dependent library but with different
versions. We hope to force the order so that we can load the correct
library version as we want.


Thanks
Leon

Re: Questions regarding classpath loading order in YarnClusterDescriptor

Posted by Geng Biao <bi...@gmail.com>.
Hi Leon,
You are welcome. ‘Each plugin is loaded through its own classloader’(see doc<https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/filesystems/plugins/>) and as a result, they are not added to the flink system classpath. If I understand correctly, you do not need to do extra work if you set them correctly in the flink-conf.yaml.
If you have some dependency jars for a specific flink job, since 1.15.0, you can put those jars under ‘usrlib’ (if the dir does not exist, you can create it by yourself) which will be shipped automatically as well.

Best,
Biao Geng

From: Leon Xu <lx...@attentivemobile.com>
Date: Sunday, June 5, 2022 at 4:04 PM
To: Biao Geng <bi...@gmail.com>
Cc: user <us...@flink.apache.org>
Subject: Re: Questions regarding classpath loading order in YarnClusterDescriptor

Hi Biao,

I really appreciate your thorough answers. And yes for now I took the workaround by manipulating the directory names.
To follow up with one more question if you don't mind:
What is the recommended way of managing plugins in YarnClusterDescriptor? Currently I am placing the plugins (e.g. flink-s3-fs-hadoop) under the system jars setting, which works. But I am also seeing this comment in the code<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L908> so I am a bit confused.


Thanks
Leon

On Sat, Jun 4, 2022 at 11:03 PM Biao Geng <bi...@gmail.com>> wrote:
Hi Leon,

For your question1, in the classpath, there are 2 types of jars: user jars and flink system jars(i.e. jars in flink/lib). System jars are sorted alphabetically. For user jars, there are 3 choices to add user jars in the final classpath: ORDER, FIRST, LAST(See the doc<https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/resource-providers/yarn/#user-jars--classpath> for more details). To my best knowledge, there is no way to pass a sort function for this for now. One workaround is managing your jar paths. You can put the jar that you want to load first in an alphabetical smaller directory(e.g a-flink/user-jar).
For your question2, flink-dist.jar is always at the end of the system jars. Depending on your choices of adding user jars, it is not always at the end of the final generated classpath. flink-dist.jar is special and mandatory as we need it to launch java process to run ClusterEntrypoint on the cluster side. Other jars in the flink/lib can somehow be compromised.

I have met a similar problem as well. My previous woraround is managing the directory name, which is not so elegant. It can be useful to add the ability to customize loading orders of jars in classpath while it is also important to package the jars more carefully to avoid the conflicts.

Best,
Biao Geng


Leon Xu <lx...@attentivemobile.com>> 于2022年6月5日周日 03:21写道:
Hi Flink Community,

We are building on top of  org.apache.flink.yarn.YarnClusterDescriptor to submit a flink application from Java code to YARN cluster, in the application mode. We are setting the classpath as the value of the yarn.provided.lib.dirs property under the yarn configuration.

By playing with the YarnClusterDescriptor code I have two questions that I hope to get some answers:
1. YarnClusterDescriptor seems to force the classpath loading in alphabetical order. See code here<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>. Is there any specific reason for doing that? If I'd like to enforce my own order is it possible now?
2. Looks like the flink-dist.jar is treated separately from the other classpath classes. In the YarnApplicationFileUploader class, the registerMultipleLocalResources method will skip the jar if it is a dist jar. See the code here<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>. With the current behavior it seems it will always place the flink-dist.jar at the end of the classpath. Is there any reason that Flink wants to treat the flink-dist.jar separately from other jars?

In our classpath loading we are hoping to enforce certain order because different jars may contain the same dependent library but with different versions. We hope to force the order so that we can load the correct library version as we want.


Thanks
Leon

Re: Questions regarding classpath loading order in YarnClusterDescriptor

Posted by Leon Xu <lx...@attentivemobile.com>.
Hi Biao,

I really appreciate your thorough answers. And yes for now I took the
workaround by manipulating the directory names.
To follow up with one more question if you don't mind:
What is the recommended way of managing plugins in YarnClusterDescriptor?
Currently I am placing the plugins (e.g. flink-s3-fs-hadoop) under the
system jars setting, which works. But I am also seeing this comment in the
code
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L908>
so I am a bit confused.


Thanks
Leon

On Sat, Jun 4, 2022 at 11:03 PM Biao Geng <bi...@gmail.com> wrote:

> Hi Leon,
>
> For your question1, in the classpath, there are 2 types of jars: user jars
> and flink system jars(i.e. jars in flink/lib). System jars are sorted
> alphabetically. For user jars, there are 3 choices to add user jars in the
> final classpath: ORDER, FIRST, LAST(See the doc
> <https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/resource-providers/yarn/#user-jars--classpath>
> for more details). To my best knowledge, there is no way to pass a sort
> function for this for now. One workaround is managing your jar paths. You
> can put the jar that you want to load first in an alphabetical smaller
> directory(e.g a-flink/user-jar).
> For your question2, flink-dist.jar is always at the end of the system
> jars. Depending on your choices of adding user jars, it is not always at
> the end of the final generated classpath. flink-dist.jar is special and
> mandatory as we need it to launch java process to run ClusterEntrypoint on
> the cluster side. Other jars in the flink/lib can somehow be compromised.
>
> I have met a similar problem as well. My previous woraround is managing
> the directory name, which is not so elegant. It can be useful to add the
> ability to customize loading orders of jars in classpath while it is also
> important to package the jars more carefully to avoid the conflicts.
>
> Best,
> Biao Geng
>
>
> Leon Xu <lx...@attentivemobile.com> 于2022年6月5日周日 03:21写道:
>
>> Hi Flink Community,
>>
>> We are building on top of  *org.apache.flink.yarn.YarnClusterDescriptor *to
>> submit a flink application from Java code to YARN cluster, in the
>> application mode. We are setting the classpath as the value of *the yarn.provided.lib.dirs
>> *property under the yarn configuration.
>>
>> By playing with the YarnClusterDescriptor code I have two questions that
>> I hope to get some answers:
>> 1. YarnClusterDescriptor seems to force the classpath loading in
>> alphabetical order. See code here
>> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>.
>> Is there any specific reason for doing that? If I'd like to enforce my own
>> order is it possible now?
>> 2. Looks like the *flink-dist.jar* is treated separately from the other
>> classpath classes. In the *YarnApplicationFileUploader* class,
>> the registerMultipleLocalResources method will skip the jar if it is a dist
>> jar. See the code here
>> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>.
>> With the current behavior it seems it will always place the flink-dist.jar
>> at the end of the classpath. Is there any reason that Flink wants to treat
>> the *flink-dist.jar* separately from other jars?
>>
>> In our classpath loading we are hoping to enforce certain order because
>> different jars may contain the same dependent library but with different
>> versions. We hope to force the order so that we can load the correct
>> library version as we want.
>>
>>
>> Thanks
>> Leon
>>
>

Re: Questions regarding classpath loading order in YarnClusterDescriptor

Posted by Biao Geng <bi...@gmail.com>.
Hi Leon,

For your question1, in the classpath, there are 2 types of jars: user jars
and flink system jars(i.e. jars in flink/lib). System jars are sorted
alphabetically. For user jars, there are 3 choices to add user jars in the
final classpath: ORDER, FIRST, LAST(See the doc
<https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/resource-providers/yarn/#user-jars--classpath>
for more details). To my best knowledge, there is no way to pass a sort
function for this for now. One workaround is managing your jar paths. You
can put the jar that you want to load first in an alphabetical smaller
directory(e.g a-flink/user-jar).
For your question2, flink-dist.jar is always at the end of the system jars.
Depending on your choices of adding user jars, it is not always at the end
of the final generated classpath. flink-dist.jar is special and mandatory
as we need it to launch java process to run ClusterEntrypoint on the
cluster side. Other jars in the flink/lib can somehow be compromised.

I have met a similar problem as well. My previous woraround is managing the
directory name, which is not so elegant. It can be useful to add the
ability to customize loading orders of jars in classpath while it is also
important to package the jars more carefully to avoid the conflicts.

Best,
Biao Geng


Leon Xu <lx...@attentivemobile.com> 于2022年6月5日周日 03:21写道:

> Hi Flink Community,
>
> We are building on top of  *org.apache.flink.yarn.YarnClusterDescriptor *to
> submit a flink application from Java code to YARN cluster, in the
> application mode. We are setting the classpath as the value of *the yarn.provided.lib.dirs
> *property under the yarn configuration.
>
> By playing with the YarnClusterDescriptor code I have two questions that I
> hope to get some answers:
> 1. YarnClusterDescriptor seems to force the classpath loading in
> alphabetical order. See code here
> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>.
> Is there any specific reason for doing that? If I'd like to enforce my own
> order is it possible now?
> 2. Looks like the *flink-dist.jar* is treated separately from the other
> classpath classes. In the *YarnApplicationFileUploader* class,
> the registerMultipleLocalResources method will skip the jar if it is a dist
> jar. See the code here
> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>.
> With the current behavior it seems it will always place the flink-dist.jar
> at the end of the classpath. Is there any reason that Flink wants to treat
> the *flink-dist.jar* separately from other jars?
>
> In our classpath loading we are hoping to enforce certain order because
> different jars may contain the same dependent library but with different
> versions. We hope to force the order so that we can load the correct
> library version as we want.
>
>
> Thanks
> Leon
>