You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpointglobal.com> on 2019/02/01 21:26:19 UTC

Multiple classloaders and Hadoop APIs

I realize this is something of a broad question, but I hope that someone has already had to deal with it and can share some experience.  Our application has multiple classloaders.  This is to avoid dependency conflicts between the various APIs that our software accesses, including but not limited to Hadoop.  Without using a multi-classloader strategy (or Java 9 modules, which we cannot yet support) we encounter a lot of dependency conflicts between the Hadoop jars and everything else. So, suffice it to say that we have an URLClassLoader for all of the Hadoop jars, and other classloaders for other set of libraries.

The problem is this: The system classloader (the one that the JVM starts with) does not have the Hadoop jars.  This means that the thread-context classloader, which is the system classloader by default, does not have the Hadoop jars.  Unfortunately, Hadoop APIs of all sorts seem to use the thread-context classloader implicitly, which leads to all manner of ClassNotFound errors, ServiceLoaders returning empty lists, factories not finding implementations, etc.  This is driving us slowly mad, because new "not found" issues crop up occasionally in the field as our customers discover new security options and other configuration variants, which leads down code paths that fail to find classes or implementations of interfaces.

You may suggest the obvious: Always set the thread-context classloader to be the Hadoop classloader.  Unfortunately this doesn't seem to be a solution.  First, threads get spawned which have the default system classloader as the thread-context-classloader, and I don't have control over all of those threads.  Second, it is error-prone and probably too expensive to save and restore the thread-context classloader at every possible Hadoop entry point.

The only answer I've found is to learn by trial-and-error which classes are loaded by the thread-context classloader (or by ServiceLoader), and force them to pre-load where I've set the thread-context classloader.  Is there a better way?  If not, is there a list of such classes?

Thanks,
John Lilley


RE: Multiple classloaders and Hadoop APIs

Posted by John Lilley <jo...@redpointglobal.com>.
Sean,
Thanks for the reply.  We are using a lot of interfaces from Hadoop:
- Security via UserGroupInformation 
- HDFS
- YARN
- HBase
- Hive (hiveserver2 + metastore)
- Avro
- Parquet

I'm not familiar with hadoop-client; does it cover all these interfaces?  Even if it does... I'm not sure it would help.  The issue is not with my direct dependencies per se, but rather with various side-effects that eventually call into ServiceLoader or other factory-like mechanisms that use the thread-context classloader.  I find it rather ... frustrating to find all the places where this happens by trial-and-error, and then needing to figure out how to get the thread-context-classloader set at the appropriate place and time.

John Lilley

-----Original Message-----
From: Sean Busbey <bu...@cloudera.com.INVALID> 
Sent: Tuesday, February 5, 2019 10:42 AM
To: John Lilley <jo...@redpointglobal.com>
Cc: user@hadoop.apache.org
Subject: Re: Multiple classloaders and Hadoop APIs

What version of Hadoop are y'all using? Which parts of Hadoop are you using?

Can you try relying on the hadoop-client stuff from Hadoop 3? it won't be any better about use of the thread-context classloader
(unfortunately) but it does do a fair job of cutting down on the number of third party dependencies present out of the box.

On Fri, Feb 1, 2019 at 3:33 PM John Lilley <jo...@redpointglobal.com> wrote:
>
> I realize this is something of a broad question, but I hope that someone has already had to deal with it and can share some experience.  Our application has multiple classloaders.  This is to avoid dependency conflicts between the various APIs that our software accesses, including but not limited to Hadoop.  Without using a multi-classloader strategy (or Java 9 modules, which we cannot yet support) we encounter a lot of dependency conflicts between the Hadoop jars and everything else. So, suffice it to say that we have an URLClassLoader for all of the Hadoop jars, and other classloaders for other set of libraries.
>
>
>
> The problem is this: The system classloader (the one that the JVM starts with) does not have the Hadoop jars.  This means that the thread-context classloader, which is the system classloader by default, does not have the Hadoop jars.  Unfortunately, Hadoop APIs of all sorts seem to use the thread-context classloader implicitly, which leads to all manner of ClassNotFound errors, ServiceLoaders returning empty lists, factories not finding implementations, etc.  This is driving us slowly mad, because new “not found” issues crop up occasionally in the field as our customers discover new security options and other configuration variants, which leads down code paths that fail to find classes or implementations of interfaces.
>
>
>
> You may suggest the obvious: Always set the thread-context classloader to be the Hadoop classloader.  Unfortunately this doesn’t seem to be a solution.  First, threads get spawned which have the default system classloader as the thread-context-classloader, and I don’t have control over all of those threads.  Second, it is error-prone and probably too expensive to save and restore the thread-context classloader at every possible Hadoop entry point.
>
>
>
> The only answer I’ve found is to learn by trial-and-error which classes are loaded by the thread-context classloader (or by ServiceLoader), and force them to pre-load where I’ve set the thread-context classloader.  Is there a better way?  If not, is there a list of such classes?
>
>
>
> Thanks,
>
> John Lilley
>
>



--
busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: Multiple classloaders and Hadoop APIs

Posted by Sean Busbey <bu...@cloudera.com.INVALID>.
What version of Hadoop are y'all using? Which parts of Hadoop are you using?

Can you try relying on the hadoop-client stuff from Hadoop 3? it won't
be any better about use of the thread-context classloader
(unfortunately) but it does do a fair job of cutting down on the
number of third party dependencies present out of the box.

On Fri, Feb 1, 2019 at 3:33 PM John Lilley
<jo...@redpointglobal.com> wrote:
>
> I realize this is something of a broad question, but I hope that someone has already had to deal with it and can share some experience.  Our application has multiple classloaders.  This is to avoid dependency conflicts between the various APIs that our software accesses, including but not limited to Hadoop.  Without using a multi-classloader strategy (or Java 9 modules, which we cannot yet support) we encounter a lot of dependency conflicts between the Hadoop jars and everything else. So, suffice it to say that we have an URLClassLoader for all of the Hadoop jars, and other classloaders for other set of libraries.
>
>
>
> The problem is this: The system classloader (the one that the JVM starts with) does not have the Hadoop jars.  This means that the thread-context classloader, which is the system classloader by default, does not have the Hadoop jars.  Unfortunately, Hadoop APIs of all sorts seem to use the thread-context classloader implicitly, which leads to all manner of ClassNotFound errors, ServiceLoaders returning empty lists, factories not finding implementations, etc.  This is driving us slowly mad, because new “not found” issues crop up occasionally in the field as our customers discover new security options and other configuration variants, which leads down code paths that fail to find classes or implementations of interfaces.
>
>
>
> You may suggest the obvious: Always set the thread-context classloader to be the Hadoop classloader.  Unfortunately this doesn’t seem to be a solution.  First, threads get spawned which have the default system classloader as the thread-context-classloader, and I don’t have control over all of those threads.  Second, it is error-prone and probably too expensive to save and restore the thread-context classloader at every possible Hadoop entry point.
>
>
>
> The only answer I’ve found is to learn by trial-and-error which classes are loaded by the thread-context classloader (or by ServiceLoader), and force them to pre-load where I’ve set the thread-context classloader.  Is there a better way?  If not, is there a list of such classes?
>
>
>
> Thanks,
>
> John Lilley
>
>



-- 
busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org