You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by aljoscha <gi...@git.apache.org> on 2017/10/23 15:07:25 UTC

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/4891

     [FLINK-7669] Always load Flink classes via parent ClassLoader

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink jira-7669-classloader-fix-flink-always-parent

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4891
    
----
commit 6ed8bd2b51676bbbd199fcd452136de99445aba4
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2017-10-23T15:03:21Z

    [FLINK-7669] Always load Flink classes via parent ClassLoader
    
    Before, when setting classloader.resolve-order Flink classes where also
    resolved via the child ClassLoader which could lead to problems if the
    user-jar contains copies of Flink classes. This change introduces a new
    setting classloader.always-parent-first-patterns, which is
    "org.apache.flink" by default. Classes that match this pattern will
    always be resolved by the parent first, even when using the child-first
    ClassLoader.

commit dda0dc6247e53724177d7754ae7add91f0697082
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2017-10-23T15:05:57Z

    [FLINK-7669] Add ClassLoader resolution order to classloader doc

----


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146336544
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java ---
    @@ -116,8 +116,15 @@ public static JobManagerServices fromConfiguration(
     		final String classLoaderResolveOrder =
     			config.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
     
    +		final String alwaysParentFirstLoaderString =
    --- End diff --
    
    I think this could use some exception handling, for example when the parameter is null, or empty, or does not split.


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Looks good to me, +1
    
    When we start introducing even more config options for class loading, it makes sense to introduce a class to bundle those up (like resolve order, parent first patterns, and future strings and arrays).


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Good fix, with minor comments.
    
    I think we want to have even more default parent-first classes, because they leak through the public API.
      - `org.slf4j`
      - `javax.annotation`


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146500656
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoaders.java ---
    @@ -31,24 +31,23 @@
      */
     public class FlinkUserCodeClassLoaders {
     
    -	public static URLClassLoader parentFirst(URL[] urls) {
    -		return new ParentFirstClassLoader(urls);
    -	}
    -
     	public static URLClassLoader parentFirst(URL[] urls, ClassLoader parent) {
     		return new ParentFirstClassLoader(urls, parent);
     	}
     
    -	public static URLClassLoader childFirst(URL[] urls, ClassLoader parent) {
    -		return new ChildFirstClassLoader(urls, parent);
    +	public static URLClassLoader childFirst(
    +		URL[] urls,
    +		ClassLoader parent,
    +		String[] alwaysParentFirstPatterns) {
    +		return new ChildFirstClassLoader(urls, parent, alwaysParentFirstPatterns);
     	}
     
     	public static URLClassLoader create(
    -		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent) {
    +		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent, String[] alwaysParentFirstPatterns) {
    --- End diff --
    
    I specifically didn't make it a vararg parameter to ensure that all call-sites consciously have to deal with this parameter. Making it a varargs makes it easy to forget that you can/should set this.
    
    What do you think? I can still easily make it a vararg.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146573737
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java ---
    @@ -116,8 +116,15 @@ public static JobManagerServices fromConfiguration(
     		final String classLoaderResolveOrder =
     			config.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
     
    +		final String alwaysParentFirstLoaderString =
    +			config.getString(CoreOptions.ALWAYS_PARENT_FIRST_LOADER);
    +		final String[] alwaysParentFirstLoaderPatterns = alwaysParentFirstLoaderString.split(",");
    +
     		final BlobLibraryCacheManager libraryCacheManager =
    -			new BlobLibraryCacheManager(blobServer, FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder));
    +			new BlobLibraryCacheManager(
    +				blobServer,
    +				FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder),
    --- End diff --
    
    Okay, that works.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146336888
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java ---
    @@ -116,8 +116,15 @@ public static JobManagerServices fromConfiguration(
     		final String classLoaderResolveOrder =
     			config.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
     
    +		final String alwaysParentFirstLoaderString =
    +			config.getString(CoreOptions.ALWAYS_PARENT_FIRST_LOADER);
    +		final String[] alwaysParentFirstLoaderPatterns = alwaysParentFirstLoaderString.split(",");
    +
     		final BlobLibraryCacheManager libraryCacheManager =
    -			new BlobLibraryCacheManager(blobServer, FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder));
    +			new BlobLibraryCacheManager(
    +				blobServer,
    +				FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder),
    --- End diff --
    
    Does this fail with a very ugly enum exception when someone misspells the enum literal?


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Merged


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146500066
  
    --- Diff: docs/ops/config.md ---
    @@ -77,6 +77,10 @@ without explicit scheme definition, such as `/user/USERNAME/in.txt`, is going to
     - `classloader.resolve-order`: Whether Flink should use a child-first `ClassLoader` when loading
     user-code classes or a parent-first `ClassLoader`. Can be one of `parent-first` or `child-first`. (default: `child-first`)
     
    +- `classloader.always-parent-first-patterns`: A (comma-separated) list of patterns that specifies which classes should always be resolved
    --- End diff --
    
    simple prefix, changing doc 👍 


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146336120
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoaders.java ---
    @@ -31,24 +31,23 @@
      */
     public class FlinkUserCodeClassLoaders {
     
    -	public static URLClassLoader parentFirst(URL[] urls) {
    -		return new ParentFirstClassLoader(urls);
    -	}
    -
     	public static URLClassLoader parentFirst(URL[] urls, ClassLoader parent) {
     		return new ParentFirstClassLoader(urls, parent);
     	}
     
    -	public static URLClassLoader childFirst(URL[] urls, ClassLoader parent) {
    -		return new ChildFirstClassLoader(urls, parent);
    +	public static URLClassLoader childFirst(
    +		URL[] urls,
    +		ClassLoader parent,
    +		String[] alwaysParentFirstPatterns) {
    +		return new ChildFirstClassLoader(urls, parent, alwaysParentFirstPatterns);
     	}
     
     	public static URLClassLoader create(
    -		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent) {
    +		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent, String[] alwaysParentFirstPatterns) {
    --- End diff --
    
    Make this a vararg parameter for nice APIs during testing ?


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146573602
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoaders.java ---
    @@ -31,24 +31,23 @@
      */
     public class FlinkUserCodeClassLoaders {
     
    -	public static URLClassLoader parentFirst(URL[] urls) {
    -		return new ParentFirstClassLoader(urls);
    -	}
    -
     	public static URLClassLoader parentFirst(URL[] urls, ClassLoader parent) {
     		return new ParentFirstClassLoader(urls, parent);
     	}
     
    -	public static URLClassLoader childFirst(URL[] urls, ClassLoader parent) {
    -		return new ChildFirstClassLoader(urls, parent);
    +	public static URLClassLoader childFirst(
    +		URL[] urls,
    +		ClassLoader parent,
    +		String[] alwaysParentFirstPatterns) {
    +		return new ChildFirstClassLoader(urls, parent, alwaysParentFirstPatterns);
     	}
     
     	public static URLClassLoader create(
    -		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent) {
    +		ResolveOrder resolveOrder, URL[] urls, ClassLoader parent, String[] alwaysParentFirstPatterns) {
    --- End diff --
    
    No, I see your argument. Keep it.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146503853
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java ---
    @@ -116,8 +116,15 @@ public static JobManagerServices fromConfiguration(
     		final String classLoaderResolveOrder =
     			config.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
     
    +		final String alwaysParentFirstLoaderString =
    +			config.getString(CoreOptions.ALWAYS_PARENT_FIRST_LOADER);
    +		final String[] alwaysParentFirstLoaderPatterns = alwaysParentFirstLoaderString.split(",");
    +
     		final BlobLibraryCacheManager libraryCacheManager =
    -			new BlobLibraryCacheManager(blobServer, FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder));
    +			new BlobLibraryCacheManager(
    +				blobServer,
    +				FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder),
    --- End diff --
    
    This will fail with `IllegalArgumentException("Unknown resolve order: " + resolveOrder)`.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146335716
  
    --- Diff: docs/monitoring/debugging_classloading.md ---
    @@ -57,22 +57,34 @@ YARN classloading differs between single job deployments and sessions:
     
     **Mesos**
     
    -Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a 
    +Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a
     YARN session: The TaskManager and JobManager processes are started with the Flink framework classes in classpath, job
     classes are loaded dynamically when the jobs are submitted.
     
    +## Configuring ClassLoader Resolution Order
    +
    +Flink uses a hierarchy of ClassLoaders for loading classes from the user-code jar(s). The user-code
    +ClassLoader has a reference to the parent ClassLoader, which is the default Java ClassLoader in most
    +cases. By default, Java ClassLoaders will first look for classes in the parent ClassLoader and then in
    +the child ClassLoader for cases where we have a hierarchy of ClassLoaders. This is problematic if you
    +have in your user jar a version of a library that conflicts with a version that comes with Flink. You can
    +change this behaviour by configuring the ClassLoader resolution order via
    +`classloader.resolve-order: child-first` in the Flink config. However, Flink classes will still
    +be resolved through the parent ClassLoader first, altough you can also configure this via
    --- End diff --
    
    altough -> although


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146500129
  
    --- Diff: docs/monitoring/debugging_classloading.md ---
    @@ -57,22 +57,34 @@ YARN classloading differs between single job deployments and sessions:
     
     **Mesos**
     
    -Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a 
    +Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a
     YARN session: The TaskManager and JobManager processes are started with the Flink framework classes in classpath, job
     classes are loaded dynamically when the jobs are submitted.
     
    +## Configuring ClassLoader Resolution Order
    +
    +Flink uses a hierarchy of ClassLoaders for loading classes from the user-code jar(s). The user-code
    +ClassLoader has a reference to the parent ClassLoader, which is the default Java ClassLoader in most
    +cases. By default, Java ClassLoaders will first look for classes in the parent ClassLoader and then in
    +the child ClassLoader for cases where we have a hierarchy of ClassLoaders. This is problematic if you
    +have in your user jar a version of a library that conflicts with a version that comes with Flink. You can
    +change this behaviour by configuring the ClassLoader resolution order via
    +`classloader.resolve-order: child-first` in the Flink config. However, Flink classes will still
    +be resolved through the parent ClassLoader first, altough you can also configure this via
    +`classloader.always-parent-first-patterns`, which is `org.apache.flink`, by default.
    --- End diff --
    
    Fixing, I'm very open to suggestions here because I don't like the name much.


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Concerning loggers: I think for logging frameworks, they should rarely bleed through, but if you instantiate multiple ones, you get problems. It is more serious for the logging backend (where having multiple ones may mean logging does not happen at all or get at least crazy warnings), but preventing duplication in the logging frontend cannot hurt I believe.
    
    Regarding non-primitive config options: Nice idea, but I would suggest to do it without for now, because it does not seem crucial enough for me to slow down a release for that.
    
    How about configuring the default parent first options as:
    `java.;org.apache.flink.;javax.annotation;org.slf4j;org.apache.log4j;org.apache.logging.log4j;ch.qos.logback`


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Thanks for the reviews, @zentol and @StephanEwen?
    
    @StephanEwen Regarding your last two comments, it would be nice if the `ConfigOption` itself did the parsing, i.e. if I could have a `ConfigOption<String, String[]>` or some such that parses a string to a list of strings and deals with all exceptions. Same would be true for the resolve order, where I would like to have `ConfigOption<String, ResolveOrder>` and it internally does the enum parsing. As it is now I have to do the parsing and exception handling in all places where the config is used. What do you think?
    
    Regarding the list of parent-first patterns, `javax.annotation` I can see, but where do loggers bleed through the API?


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    @zentol That would be excellent to have!


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146338877
  
    --- Diff: docs/monitoring/debugging_classloading.md ---
    @@ -57,22 +57,34 @@ YARN classloading differs between single job deployments and sessions:
     
     **Mesos**
     
    -Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a 
    +Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a
     YARN session: The TaskManager and JobManager processes are started with the Flink framework classes in classpath, job
     classes are loaded dynamically when the jobs are submitted.
     
    +## Configuring ClassLoader Resolution Order
    +
    +Flink uses a hierarchy of ClassLoaders for loading classes from the user-code jar(s). The user-code
    +ClassLoader has a reference to the parent ClassLoader, which is the default Java ClassLoader in most
    +cases. By default, Java ClassLoaders will first look for classes in the parent ClassLoader and then in
    +the child ClassLoader for cases where we have a hierarchy of ClassLoaders. This is problematic if you
    +have in your user jar a version of a library that conflicts with a version that comes with Flink. You can
    +change this behaviour by configuring the ClassLoader resolution order via
    +`classloader.resolve-order: child-first` in the Flink config. However, Flink classes will still
    +be resolved through the parent ClassLoader first, altough you can also configure this via
    +`classloader.always-parent-first-patterns`, which is `org.apache.flink`, by default.
    --- End diff --
    
    change `always-parent-first-patterns` to `parent-first-patterns`?


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146499992
  
    --- Diff: docs/monitoring/debugging_classloading.md ---
    @@ -57,22 +57,34 @@ YARN classloading differs between single job deployments and sessions:
     
     **Mesos**
     
    -Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a 
    +Mesos setups following [this documentation](../ops/deployment/mesos.html) currently behave very much like the a
     YARN session: The TaskManager and JobManager processes are started with the Flink framework classes in classpath, job
     classes are loaded dynamically when the jobs are submitted.
     
    +## Configuring ClassLoader Resolution Order
    +
    +Flink uses a hierarchy of ClassLoaders for loading classes from the user-code jar(s). The user-code
    +ClassLoader has a reference to the parent ClassLoader, which is the default Java ClassLoader in most
    +cases. By default, Java ClassLoaders will first look for classes in the parent ClassLoader and then in
    +the child ClassLoader for cases where we have a hierarchy of ClassLoaders. This is problematic if you
    +have in your user jar a version of a library that conflicts with a version that comes with Flink. You can
    +change this behaviour by configuring the ClassLoader resolution order via
    +`classloader.resolve-order: child-first` in the Flink config. However, Flink classes will still
    +be resolved through the parent ClassLoader first, altough you can also configure this via
    --- End diff --
    
    fixing


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    @aljoscha I've suggested non-primitive/string ConfigOptions in https://issues.apache.org/jira/browse/FLINK-6577, along with moving the validation into the ConfigOptions in https://issues.apache.org/jira/browse/FLINK-6576.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha closed the pull request at:

    https://github.com/apache/flink/pull/4891


---

[GitHub] flink issue #4891: [FLINK-7669] Always load Flink classes via parent ClassLo...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4891
  
    Yes, I wasn't expecting to do that for the release. 😅
    
    The defaults make sense, I'll push a fixup commit.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146504815
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java ---
    @@ -116,8 +116,15 @@ public static JobManagerServices fromConfiguration(
     		final String classLoaderResolveOrder =
     			config.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
     
    +		final String alwaysParentFirstLoaderString =
    --- End diff --
    
    The parameter cannot be `null`, since there is a default value. If it is empty, won't `split` simply return an empty array?
    
    I made a comment about this below, i.e. where we should handle parsing of config options.


---

[GitHub] flink pull request #4891: [FLINK-7669] Always load Flink classes via parent ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4891#discussion_r146336395
  
    --- Diff: docs/ops/config.md ---
    @@ -77,6 +77,10 @@ without explicit scheme definition, such as `/user/USERNAME/in.txt`, is going to
     - `classloader.resolve-order`: Whether Flink should use a child-first `ClassLoader` when loading
     user-code classes or a parent-first `ClassLoader`. Can be one of `parent-first` or `child-first`. (default: `child-first`)
     
    +- `classloader.always-parent-first-patterns`: A (comma-separated) list of patterns that specifies which classes should always be resolved
    --- End diff --
    
    are these prefixes; does it support regex?


---