You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2016/12/12 20:14:59 UTC

[jira] [Comment Edited] (HADOOP-13070) classloading isolation improvements for stricter dependencies

    [ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742855#comment-15742855 ] 

Sangjin Lee edited comment on HADOOP-13070 at 12/12/16 8:14 PM:
----------------------------------------------------------------

I’ve been testing the latest patch (HADOOP-13398) and there seems to be one interesting issue with the stricter classpath isolation, and that has to do with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service classes dynamically:
{code}
Enumeration<URL> defs = classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator<String> names = parse(def);
  for (String name : names) {
    ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
  }
}
{code}
First off, {{ClassLoader.getResources()}} will return all service files, *regardless of* where the service file is, either in the user classpath or the parent classpath (bit more discussion on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of {{Class.forName()}} is {{ServiceLoader}} which is a system class, all service classes will be successfully loaded, *regardless of* where the service class is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and load stuff from the parent classpath. That said, we could still regard this as a variation of a “system facility providing a way to load things from both classpaths” case mentioned in the proposal (section 2-1).

I thought about plugging this possibility, but there doesn’t seem to be an unambiguous way to do this.

One approach I considered is to go above the call stack to identify who’s calling {{ServiceLoader.load()}}. Suppose we use the calling class to enforce stricter loading. If a user class is a calling class, it would load service files from both the user classpath and the parent classpath. However, as it iterates over the classes, it will fail to load a non-system parent class. This causes a *hard* failure on the iteration on {{ServiceLoader}}.

On the other hand, we could try to determine somehow a certain service file is a “non-system parent service file” and not return that service file resource with {{ClassLoader.getResources()}} to begin with. However, the notion of a “non-system parent service file” is not well defined, and I don’t think there is a way to define this clearly.

I think the best way forward is to allow {{ServiceLoader}} to load services from both the user and the parent classpath. I’d love to hear your thoughts on this.

(2) {{ClassLoader.getResources()}}
Currently {{ApplicationClassLoader}} does not override this. The javadoc for {{ClassLoader.getResources()}} states:
{noformat}
…
The search order is described in the documentation for getResource(String).
{noformat}
Since we do not override this today, we return the resources from the parent first and then from the child, which is not quite the same as what the javadoc indicates. So it seems to me that at minimum we want to change the order of resources so that it returns the child resources first.

The next question is whether it should return a (non-system) parent resource if a user class calls this method. We could tighten this to filter out non-system parent resource. I am leaning towards making that change.

Thoughts? Feedback? Concerns?
cc [~busbey]



was (Author: sjlee0):
I’ve been testing the latest patch (HADOOP-13998) and there seems to be one interesting issue with the stricter classpath isolation, and that has to do with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service classes dynamically:
{code}
Enumeration<URL> defs = classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator<String> names = parse(def);
  for (String name : names) {
    ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
  }
}
{code}
First off, {{ClassLoader.getResources()}} will return all service files, *regardless of* where the service file is, either in the user classpath or the parent classpath (bit more discussion on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of {{Class.forName()}} is {{ServiceLoader}} which is a system class, all service classes will be successfully loaded, *regardless of* where the service class is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and load stuff from the parent classpath. That said, we could still regard this as a variation of a “system facility providing a way to load things from both classpaths” case mentioned in the proposal (section 2-1).

I thought about plugging this possibility, but there doesn’t seem to be an unambiguous way to do this.

One approach I considered is to go above the call stack to identify who’s calling {{ServiceLoader.load()}}. Suppose we use the calling class to enforce stricter loading. If a user class is a calling class, it would load service files from both the user classpath and the parent classpath. However, as it iterates over the classes, it will fail to load a non-system parent class. This causes a *hard* failure on the iteration on {{ServiceLoader}}.

On the other hand, we could try to determine somehow a certain service file is a “non-system parent service file” and not return that service file resource with {{ClassLoader.getResources()}} to begin with. However, the notion of a “non-system parent service file” is not well defined, and I don’t think there is a way to define this clearly.

I think the best way forward is to allow {{ServiceLoader}} to load services from both the user and the parent classpath. I’d love to hear your thoughts on this.

(2) {{ClassLoader.getResources()}}
Currently {{ApplicationClassLoader}} does not override this. The javadoc for {{ClassLoader.getResources()}} states:
{noformat}
…
The search order is described in the documentation for getResource(String).
{noformat}
Since we do not override this today, we return the resources from the parent first and then from the child, which is not quite the same as what the javadoc indicates. So it seems to me that at minimum we want to change the order of resources so that it returns the child resources first.

The next question is whether it should return a (non-system) parent resource if a user class calls this method. We could tighten this to filter out non-system parent resource. I am leaning towards making that change.

Thoughts? Feedback? Concerns?
cc [~busbey]


> classloading isolation improvements for stricter dependencies
> -------------------------------------------------------------
>
>                 Key: HADOOP-13070
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13070
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf, lib.jar
>
>
> Related to HADOOP-11656, we would like to make a number of improvements in terms of classloading isolation so that user-code can run safely without worrying about dependency collisions with the Hadoop dependencies.
> By the same token, it should raised the quality of the user code and its specified classpath so that users get clear signals if they specify incorrect classpaths.
> This will contain a proposal that will include several improvements some of which may not be backward compatible. As such, it should be targeted to the next major revision of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org