You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Andrew Pilloud <ap...@google.com> on 2018/06/04 20:57:26 UTC

Beam breaks when it isn't loaded via the Thread Context Class Loader

I'm having class loading issues that go away when I revert the changes in
our use of Class.forName added in https://github.com/apache/beam/pull/4674.
The problem I'm having is that the typical JDBC GUI
(SqlWorkbench/J, SQuirreL SQL) creates an isolated class loader to load our
library. Things work if we call Class.forName with the default class loader
[getClass().getClassLoader() or no argument] but not if we use the thread
context class loader [Thread.currentThread().getContextClassLoader() or
ReflectHelpers.findClassLoader()]. Why is using the default class loader
not the right thing to do? How can I fix this problem?

See this integration test for an example:
https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k

Andrew

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Romain Manni-Bucau <rm...@gmail.com>.
if you have a javaagent you can otherwise you can't but beam can proxy all
instances it sees which would be enough while the transforms* don't create
their own classloaders without reusing the TCCL. On the deserialization
side it is easy since it is beam land and it "just "needs to find back or
create the right classloader and when the last instance using this
classloader disappear it destroys it.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le mer. 13 juin 2018 à 20:32, Lukasz Cwik <lc...@google.com> a écrit :

> I'm assuming that you have control over the application environment. Would
> it be possible to replace the ObjectInputStream that the JVM provides with
> your own version that uses the thread context class loader and manage the
> classloader per thread depending on what "application" owns that thread?
> (You would need to add a bunch of logic to correctly associate each
> created thread/thread factory with the correct application but most
> application containers already need to do this).
>
> On Wed, Jun 13, 2018 at 11:12 AM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> (answered inline)
>>
>>
>> Le mer. 13 juin 2018 à 18:42, Lukasz Cwik <lc...@google.com> a écrit :
>>
>>> Thanks for the example Romain.
>>>
>>> I took a look through it and was wondering whether it is only the root
>>> objects in the deserialization tree that need to implement
>>> SerializableService?
>>> Do lots of things need to implement SerializableService typically?
>>>
>>
>> yes, on the one entering the (de)serialization process must handle that
>>
>>
>>> What do you do with types that you don't control (for example do you
>>> create wrapper types)?
>>>
>>
>> Like beam classes? ;)
>>
>> You can instrument their bytecode like in
>> https://github.com/Talend/component-runtime/blob/master/component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transformer/BeamIOTransformer.java#L208
>> (sorry i dont use bytebuddy but directly asm). This is quite easy to do as
>> soon as you have a classloader for these classes or - if you reuse the jvm
>> classloader - a javaagent or equivalent.
>>
>> If you don't want/can't change the bytecode then you manipulate a proxy
>> instead of the real instance (a wrapper):
>> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/asm/ProxyGenerator.java#L118
>> .
>>
>>
>>>
>>> On Wed, Jun 6, 2018 at 9:56 PM Romain Manni-Bucau <rm...@gmail.com>
>>> wrote:
>>>
>>>> Note sure the example is atomic enough but in
>>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40
>>>> the "instance()" is a singleton used by all the runtime of the framework.
>>>>
>>>> Deserialization happens in
>>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26
>>>> and serialization is about creating this object in a write replace. Then
>>>> the runtime is switching its classloader (runner for beam) as in
>>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60
>>>> asap and resets it once done to not break its environment for reused jvms
>>>> case.
>>>>
>>>> If we take the case of an IO, the io would lazily creates its defined
>>>> classloader from its spec and use some reference counting logic to destroy
>>>> it when needed in its teardown for instance. The io then does the
>>>> classloader switch in its callbacks (setup/teardown/process/bundle hooks
>>>> etc).
>>>>
>>>>
>>>> Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit :
>>>>
>>>>> Romain, can you point to an example of a global singleton registry
>>>>> that does this right for class loading (it may allow people to work towards
>>>>> such an effort)?
>>>>>
>>>>> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <
>>>>> rmannibucau@gmail.com> wrote:
>>>>>
>>>>>> It is actually very localised in runner code where beam should reset
>>>>>> the classloader when the deserialization happens and then the runner owns
>>>>>> the classloader all the way in evaluators.
>>>>>>
>>>>>> If IO change the classloader they must indeed handle it too and patch
>>>>>> the deserialization too.
>>>>>>
>>>>>> Here again (we mentionned it multiple times in other threads) beam
>>>>>> misses a global singleton registry where you can register a "service" to
>>>>>> look it up based of a serialization configuration and a lifecycle allowing
>>>>>> to close the classloader in all instances without hacks.
>>>>>>
>>>>>>
>>>>>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>>>>>>
>>>>>>> Perhaps we can also adopt a practice of making our own APIs
>>>>>>> explicitly pass a Classloader when appropriate so we only have to set this
>>>>>>> when we are entering code that does not have good hygiene. It might
>>>>>>> actually be nice to have a lightweight static analysis to forbid bad
>>>>>>> methods in our code.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>>> I totally agree, but there are so many Java APIs (including ours)
>>>>>>>> that messed this up so everyone lives with the same hack.
>>>>>>>>
>>>>>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It seems like a terribly fragile way to pass arguments but my
>>>>>>>>> tests pass when I wrap the JDBC path into Beam pipeline execution with that
>>>>>>>>> pattern.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It is a common mistake for APIs to not include a way to specify
>>>>>>>>>> which class loader to use when doing something like deserializing an
>>>>>>>>>> instance of a class via the ObjectInputStream. This common issue also
>>>>>>>>>> affects Apache Beam (SerializableCoder, PipelineOptionsFactory, ...) and
>>>>>>>>>> the way that typical Java APIs have gotten around this is to use to the
>>>>>>>>>> thread context class loader (TCCL) as the way to plumb this additional
>>>>>>>>>> attribute through. So Apache Beam is meant to in all places honor the TCCL
>>>>>>>>>> if it has been set as most Java libraries (not all) do the same hack.
>>>>>>>>>>
>>>>>>>>>> In most environments the TCCL is not set and we are working with
>>>>>>>>>> a single class loader. It turns out that in more complicated environments
>>>>>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>>>>>>> this usually doesn't work without each caller knowing what class loading
>>>>>>>>>> context they should be in. A common work around for most scenarios is to
>>>>>>>>>> always set the TCCL to the current classes class loader like so before
>>>>>>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>>>>>>> the caller along since they may have set it for some other reason:
>>>>>>>>>>
>>>>>>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <
>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm having class loading issues that go away when I revert the
>>>>>>>>>>> changes in our use of Class.forName added in
>>>>>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm
>>>>>>>>>>> having is that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates
>>>>>>>>>>> an isolated class loader to load our library. Things work if we call
>>>>>>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>>>>>>> no argument] but not if we use the thread context class loader
>>>>>>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>>>>>>
>>>>>>>>>>> See this integration test for an example:
>>>>>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>>>>>>
>>>>>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>>>>>>
>>>>>>>>>>> Andrew
>>>>>>>>>>>
>>>>>>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Lukasz Cwik <lc...@google.com>.
I'm assuming that you have control over the application environment. Would
it be possible to replace the ObjectInputStream that the JVM provides with
your own version that uses the thread context class loader and manage the
classloader per thread depending on what "application" owns that thread?
(You would need to add a bunch of logic to correctly associate each created
thread/thread factory with the correct application but most application
containers already need to do this).

On Wed, Jun 13, 2018 at 11:12 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> (answered inline)
>
>
> Le mer. 13 juin 2018 à 18:42, Lukasz Cwik <lc...@google.com> a écrit :
>
>> Thanks for the example Romain.
>>
>> I took a look through it and was wondering whether it is only the root
>> objects in the deserialization tree that need to implement
>> SerializableService?
>> Do lots of things need to implement SerializableService typically?
>>
>
> yes, on the one entering the (de)serialization process must handle that
>
>
>> What do you do with types that you don't control (for example do you
>> create wrapper types)?
>>
>
> Like beam classes? ;)
>
> You can instrument their bytecode like in
> https://github.com/Talend/component-runtime/blob/master/component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transformer/BeamIOTransformer.java#L208
> (sorry i dont use bytebuddy but directly asm). This is quite easy to do as
> soon as you have a classloader for these classes or - if you reuse the jvm
> classloader - a javaagent or equivalent.
>
> If you don't want/can't change the bytecode then you manipulate a proxy
> instead of the real instance (a wrapper):
> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/asm/ProxyGenerator.java#L118
> .
>
>
>>
>> On Wed, Jun 6, 2018 at 9:56 PM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>>> Note sure the example is atomic enough but in
>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40
>>> the "instance()" is a singleton used by all the runtime of the framework.
>>>
>>> Deserialization happens in
>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26
>>> and serialization is about creating this object in a write replace. Then
>>> the runtime is switching its classloader (runner for beam) as in
>>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60
>>> asap and resets it once done to not break its environment for reused jvms
>>> case.
>>>
>>> If we take the case of an IO, the io would lazily creates its defined
>>> classloader from its spec and use some reference counting logic to destroy
>>> it when needed in its teardown for instance. The io then does the
>>> classloader switch in its callbacks (setup/teardown/process/bundle hooks
>>> etc).
>>>
>>>
>>> Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit :
>>>
>>>> Romain, can you point to an example of a global singleton registry that
>>>> does this right for class loading (it may allow people to work towards such
>>>> an effort)?
>>>>
>>>> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <
>>>> rmannibucau@gmail.com> wrote:
>>>>
>>>>> It is actually very localised in runner code where beam should reset
>>>>> the classloader when the deserialization happens and then the runner owns
>>>>> the classloader all the way in evaluators.
>>>>>
>>>>> If IO change the classloader they must indeed handle it too and patch
>>>>> the deserialization too.
>>>>>
>>>>> Here again (we mentionned it multiple times in other threads) beam
>>>>> misses a global singleton registry where you can register a "service" to
>>>>> look it up based of a serialization configuration and a lifecycle allowing
>>>>> to close the classloader in all instances without hacks.
>>>>>
>>>>>
>>>>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>>>>>
>>>>>> Perhaps we can also adopt a practice of making our own APIs
>>>>>> explicitly pass a Classloader when appropriate so we only have to set this
>>>>>> when we are entering code that does not have good hygiene. It might
>>>>>> actually be nice to have a lightweight static analysis to forbid bad
>>>>>> methods in our code.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> I totally agree, but there are so many Java APIs (including ours)
>>>>>>> that messed this up so everyone lives with the same hack.
>>>>>>>
>>>>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It seems like a terribly fragile way to pass arguments but my
>>>>>>>> tests pass when I wrap the JDBC path into Beam pipeline execution with that
>>>>>>>> pattern.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It is a common mistake for APIs to not include a way to specify
>>>>>>>>> which class loader to use when doing something like deserializing an
>>>>>>>>> instance of a class via the ObjectInputStream. This common issue also
>>>>>>>>> affects Apache Beam (SerializableCoder, PipelineOptionsFactory, ...) and
>>>>>>>>> the way that typical Java APIs have gotten around this is to use to the
>>>>>>>>> thread context class loader (TCCL) as the way to plumb this additional
>>>>>>>>> attribute through. So Apache Beam is meant to in all places honor the TCCL
>>>>>>>>> if it has been set as most Java libraries (not all) do the same hack.
>>>>>>>>>
>>>>>>>>> In most environments the TCCL is not set and we are working with a
>>>>>>>>> single class loader. It turns out that in more complicated environments
>>>>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>>>>>> this usually doesn't work without each caller knowing what class loading
>>>>>>>>> context they should be in. A common work around for most scenarios is to
>>>>>>>>> always set the TCCL to the current classes class loader like so before
>>>>>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>>>>>> the caller along since they may have set it for some other reason:
>>>>>>>>>
>>>>>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm having class loading issues that go away when I revert the
>>>>>>>>>> changes in our use of Class.forName added in
>>>>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having
>>>>>>>>>> is that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>>>>>>> isolated class loader to load our library. Things work if we call
>>>>>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>>>>>> no argument] but not if we use the thread context class loader
>>>>>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>>>>>
>>>>>>>>>> See this integration test for an example:
>>>>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>>>>>
>>>>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>>>>>
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Romain Manni-Bucau <rm...@gmail.com>.
(answered inline)


Le mer. 13 juin 2018 à 18:42, Lukasz Cwik <lc...@google.com> a écrit :

> Thanks for the example Romain.
>
> I took a look through it and was wondering whether it is only the root
> objects in the deserialization tree that need to implement
> SerializableService?
> Do lots of things need to implement SerializableService typically?
>

yes, on the one entering the (de)serialization process must handle that


> What do you do with types that you don't control (for example do you
> create wrapper types)?
>

Like beam classes? ;)

You can instrument their bytecode like in
https://github.com/Talend/component-runtime/blob/master/component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transformer/BeamIOTransformer.java#L208
(sorry i dont use bytebuddy but directly asm). This is quite easy to do as
soon as you have a classloader for these classes or - if you reuse the jvm
classloader - a javaagent or equivalent.

If you don't want/can't change the bytecode then you manipulate a proxy
instead of the real instance (a wrapper):
https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/asm/ProxyGenerator.java#L118
.


>
> On Wed, Jun 6, 2018 at 9:56 PM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> Note sure the example is atomic enough but in
>> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40
>> the "instance()" is a singleton used by all the runtime of the framework.
>>
>> Deserialization happens in
>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26
>> and serialization is about creating this object in a write replace. Then
>> the runtime is switching its classloader (runner for beam) as in
>> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60
>> asap and resets it once done to not break its environment for reused jvms
>> case.
>>
>> If we take the case of an IO, the io would lazily creates its defined
>> classloader from its spec and use some reference counting logic to destroy
>> it when needed in its teardown for instance. The io then does the
>> classloader switch in its callbacks (setup/teardown/process/bundle hooks
>> etc).
>>
>>
>> Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit :
>>
>>> Romain, can you point to an example of a global singleton registry that
>>> does this right for class loading (it may allow people to work towards such
>>> an effort)?
>>>
>>> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <
>>> rmannibucau@gmail.com> wrote:
>>>
>>>> It is actually very localised in runner code where beam should reset
>>>> the classloader when the deserialization happens and then the runner owns
>>>> the classloader all the way in evaluators.
>>>>
>>>> If IO change the classloader they must indeed handle it too and patch
>>>> the deserialization too.
>>>>
>>>> Here again (we mentionned it multiple times in other threads) beam
>>>> misses a global singleton registry where you can register a "service" to
>>>> look it up based of a serialization configuration and a lifecycle allowing
>>>> to close the classloader in all instances without hacks.
>>>>
>>>>
>>>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>>>>
>>>>> Perhaps we can also adopt a practice of making our own APIs explicitly
>>>>> pass a Classloader when appropriate so we only have to set this when we are
>>>>> entering code that does not have good hygiene. It might actually be nice to
>>>>> have a lightweight static analysis to forbid bad methods in our code.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I totally agree, but there are so many Java APIs (including ours)
>>>>>> that messed this up so everyone lives with the same hack.
>>>>>>
>>>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It seems like a terribly fragile way to pass arguments but my tests
>>>>>>> pass when I wrap the JDBC path into Beam pipeline execution with that
>>>>>>> pattern.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>>> It is a common mistake for APIs to not include a way to specify
>>>>>>>> which class loader to use when doing something like deserializing an
>>>>>>>> instance of a class via the ObjectInputStream. This common issue also
>>>>>>>> affects Apache Beam (SerializableCoder, PipelineOptionsFactory, ...) and
>>>>>>>> the way that typical Java APIs have gotten around this is to use to the
>>>>>>>> thread context class loader (TCCL) as the way to plumb this additional
>>>>>>>> attribute through. So Apache Beam is meant to in all places honor the TCCL
>>>>>>>> if it has been set as most Java libraries (not all) do the same hack.
>>>>>>>>
>>>>>>>> In most environments the TCCL is not set and we are working with a
>>>>>>>> single class loader. It turns out that in more complicated environments
>>>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>>>>> this usually doesn't work without each caller knowing what class loading
>>>>>>>> context they should be in. A common work around for most scenarios is to
>>>>>>>> always set the TCCL to the current classes class loader like so before
>>>>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>>>>> the caller along since they may have set it for some other reason:
>>>>>>>>
>>>>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm having class loading issues that go away when I revert the
>>>>>>>>> changes in our use of Class.forName added in
>>>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having
>>>>>>>>> is that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>>>>>> isolated class loader to load our library. Things work if we call
>>>>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>>>>> no argument] but not if we use the thread context class loader
>>>>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>>>>
>>>>>>>>> See this integration test for an example:
>>>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>>>>
>>>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Lukasz Cwik <lc...@google.com>.
Thanks for the example Romain.

I took a look through it and was wondering whether it is only the root
objects in the deserialization tree that need to implement
SerializableService?
Do lots of things need to implement SerializableService typically?
What do you do with types that you don't control (for example do you create
wrapper types)?

On Wed, Jun 6, 2018 at 9:56 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Note sure the example is atomic enough but in
> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40
> the "instance()" is a singleton used by all the runtime of the framework.
>
> Deserialization happens in
> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26
> and serialization is about creating this object in a write replace. Then
> the runtime is switching its classloader (runner for beam) as in
> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60
> asap and resets it once done to not break its environment for reused jvms
> case.
>
> If we take the case of an IO, the io would lazily creates its defined
> classloader from its spec and use some reference counting logic to destroy
> it when needed in its teardown for instance. The io then does the
> classloader switch in its callbacks (setup/teardown/process/bundle hooks
> etc).
>
>
> Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit :
>
>> Romain, can you point to an example of a global singleton registry that
>> does this right for class loading (it may allow people to work towards such
>> an effort)?
>>
>> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>>> It is actually very localised in runner code where beam should reset the
>>> classloader when the deserialization happens and then the runner owns the
>>> classloader all the way in evaluators.
>>>
>>> If IO change the classloader they must indeed handle it too and patch
>>> the deserialization too.
>>>
>>> Here again (we mentionned it multiple times in other threads) beam
>>> misses a global singleton registry where you can register a "service" to
>>> look it up based of a serialization configuration and a lifecycle allowing
>>> to close the classloader in all instances without hacks.
>>>
>>>
>>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>>>
>>>> Perhaps we can also adopt a practice of making our own APIs explicitly
>>>> pass a Classloader when appropriate so we only have to set this when we are
>>>> entering code that does not have good hygiene. It might actually be nice to
>>>> have a lightweight static analysis to forbid bad methods in our code.
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> I totally agree, but there are so many Java APIs (including ours) that
>>>>> messed this up so everyone lives with the same hack.
>>>>>
>>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> It seems like a terribly fragile way to pass arguments but my tests
>>>>>> pass when I wrap the JDBC path into Beam pipeline execution with that
>>>>>> pattern.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> It is a common mistake for APIs to not include a way to specify
>>>>>>> which class loader to use when doing something like deserializing an
>>>>>>> instance of a class via the ObjectInputStream. This common issue also
>>>>>>> affects Apache Beam (SerializableCoder, PipelineOptionsFactory, ...) and
>>>>>>> the way that typical Java APIs have gotten around this is to use to the
>>>>>>> thread context class loader (TCCL) as the way to plumb this additional
>>>>>>> attribute through. So Apache Beam is meant to in all places honor the TCCL
>>>>>>> if it has been set as most Java libraries (not all) do the same hack.
>>>>>>>
>>>>>>> In most environments the TCCL is not set and we are working with a
>>>>>>> single class loader. It turns out that in more complicated environments
>>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>>>> this usually doesn't work without each caller knowing what class loading
>>>>>>> context they should be in. A common work around for most scenarios is to
>>>>>>> always set the TCCL to the current classes class loader like so before
>>>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>>>> the caller along since they may have set it for some other reason:
>>>>>>>
>>>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm having class loading issues that go away when I revert the
>>>>>>>> changes in our use of Class.forName added in
>>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having
>>>>>>>> is that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>>>>> isolated class loader to load our library. Things work if we call
>>>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>>>> no argument] but not if we use the thread context class loader
>>>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>>>
>>>>>>>> See this integration test for an example:
>>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>>>
>>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Note sure the example is atomic enough but in
https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40
the "instance()" is a singleton used by all the runtime of the framework.

Deserialization happens in
https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26
and serialization is about creating this object in a write replace. Then
the runtime is switching its classloader (runner for beam) as in
https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60
asap and resets it once done to not break its environment for reused jvms
case.

If we take the case of an IO, the io would lazily creates its defined
classloader from its spec and use some reference counting logic to destroy
it when needed in its teardown for instance. The io then does the
classloader switch in its callbacks (setup/teardown/process/bundle hooks
etc).


Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit :

> Romain, can you point to an example of a global singleton registry that
> does this right for class loading (it may allow people to work towards such
> an effort)?
>
> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> It is actually very localised in runner code where beam should reset the
>> classloader when the deserialization happens and then the runner owns the
>> classloader all the way in evaluators.
>>
>> If IO change the classloader they must indeed handle it too and patch the
>> deserialization too.
>>
>> Here again (we mentionned it multiple times in other threads) beam misses
>> a global singleton registry where you can register a "service" to look it
>> up based of a serialization configuration and a lifecycle allowing to close
>> the classloader in all instances without hacks.
>>
>>
>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>>
>>> Perhaps we can also adopt a practice of making our own APIs explicitly
>>> pass a Classloader when appropriate so we only have to set this when we are
>>> entering code that does not have good hygiene. It might actually be nice to
>>> have a lightweight static analysis to forbid bad methods in our code.
>>>
>>> Kenn
>>>
>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> I totally agree, but there are so many Java APIs (including ours) that
>>>> messed this up so everyone lives with the same hack.
>>>>
>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> It seems like a terribly fragile way to pass arguments but my tests
>>>>> pass when I wrap the JDBC path into Beam pipeline execution with that
>>>>> pattern.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> It is a common mistake for APIs to not include a way to specify which
>>>>>> class loader to use when doing something like deserializing an instance of
>>>>>> a class via the ObjectInputStream. This common issue also affects Apache
>>>>>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>>>>>> typical Java APIs have gotten around this is to use to the thread context
>>>>>> class loader (TCCL) as the way to plumb this additional attribute through.
>>>>>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>>>>>> as most Java libraries (not all) do the same hack.
>>>>>>
>>>>>> In most environments the TCCL is not set and we are working with a
>>>>>> single class loader. It turns out that in more complicated environments
>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>>> this usually doesn't work without each caller knowing what class loading
>>>>>> context they should be in. A common work around for most scenarios is to
>>>>>> always set the TCCL to the current classes class loader like so before
>>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>>> the caller along since they may have set it for some other reason:
>>>>>>
>>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm having class loading issues that go away when I revert the
>>>>>>> changes in our use of Class.forName added in
>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>>>>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>>>> isolated class loader to load our library. Things work if we call
>>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>>> no argument] but not if we use the thread context class loader
>>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>>
>>>>>>> See this integration test for an example:
>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>>
>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Lukasz Cwik <lc...@google.com>.
Romain, can you point to an example of a global singleton registry that
does this right for class loading (it may allow people to work towards such
an effort)?

On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> It is actually very localised in runner code where beam should reset the
> classloader when the deserialization happens and then the runner owns the
> classloader all the way in evaluators.
>
> If IO change the classloader they must indeed handle it too and patch the
> deserialization too.
>
> Here again (we mentionned it multiple times in other threads) beam misses
> a global singleton registry where you can register a "service" to look it
> up based of a serialization configuration and a lifecycle allowing to close
> the classloader in all instances without hacks.
>
>
> Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :
>
>> Perhaps we can also adopt a practice of making our own APIs explicitly
>> pass a Classloader when appropriate so we only have to set this when we are
>> entering code that does not have good hygiene. It might actually be nice to
>> have a lightweight static analysis to forbid bad methods in our code.
>>
>> Kenn
>>
>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> I totally agree, but there are so many Java APIs (including ours) that
>>> messed this up so everyone lives with the same hack.
>>>
>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> It seems like a terribly fragile way to pass arguments but my tests
>>>> pass when I wrap the JDBC path into Beam pipeline execution with that
>>>> pattern.
>>>>
>>>> Thanks!
>>>>
>>>> Andrew
>>>>
>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> It is a common mistake for APIs to not include a way to specify which
>>>>> class loader to use when doing something like deserializing an instance of
>>>>> a class via the ObjectInputStream. This common issue also affects Apache
>>>>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>>>>> typical Java APIs have gotten around this is to use to the thread context
>>>>> class loader (TCCL) as the way to plumb this additional attribute through.
>>>>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>>>>> as most Java libraries (not all) do the same hack.
>>>>>
>>>>> In most environments the TCCL is not set and we are working with a
>>>>> single class loader. It turns out that in more complicated environments
>>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>>> this usually doesn't work without each caller knowing what class loading
>>>>> context they should be in. A common work around for most scenarios is to
>>>>> always set the TCCL to the current classes class loader like so before
>>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>>> the caller along since they may have set it for some other reason:
>>>>>
>>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I'm having class loading issues that go away when I revert the
>>>>>> changes in our use of Class.forName added in
>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>>>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>>> isolated class loader to load our library. Things work if we call
>>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>>> no argument] but not if we use the thread context class loader
>>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>>> not the right thing to do? How can I fix this problem?
>>>>>>
>>>>>> See this integration test for an example:
>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>>
>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Romain Manni-Bucau <rm...@gmail.com>.
It is actually very localised in runner code where beam should reset the
classloader when the deserialization happens and then the runner owns the
classloader all the way in evaluators.

If IO change the classloader they must indeed handle it too and patch the
deserialization too.

Here again (we mentionned it multiple times in other threads) beam misses a
global singleton registry where you can register a "service" to look it up
based of a serialization configuration and a lifecycle allowing to close
the classloader in all instances without hacks.


Le mar. 5 juin 2018 23:50, Kenneth Knowles <kl...@google.com> a écrit :

> Perhaps we can also adopt a practice of making our own APIs explicitly
> pass a Classloader when appropriate so we only have to set this when we are
> entering code that does not have good hygiene. It might actually be nice to
> have a lightweight static analysis to forbid bad methods in our code.
>
> Kenn
>
> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> I totally agree, but there are so many Java APIs (including ours) that
>> messed this up so everyone lives with the same hack.
>>
>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> It seems like a terribly fragile way to pass arguments but my tests
>>> pass when I wrap the JDBC path into Beam pipeline execution with that
>>> pattern.
>>>
>>> Thanks!
>>>
>>> Andrew
>>>
>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> It is a common mistake for APIs to not include a way to specify which
>>>> class loader to use when doing something like deserializing an instance of
>>>> a class via the ObjectInputStream. This common issue also affects Apache
>>>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>>>> typical Java APIs have gotten around this is to use to the thread context
>>>> class loader (TCCL) as the way to plumb this additional attribute through.
>>>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>>>> as most Java libraries (not all) do the same hack.
>>>>
>>>> In most environments the TCCL is not set and we are working with a
>>>> single class loader. It turns out that in more complicated environments
>>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>>> this usually doesn't work without each caller knowing what class loading
>>>> context they should be in. A common work around for most scenarios is to
>>>> always set the TCCL to the current classes class loader like so before
>>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>>> the caller along since they may have set it for some other reason:
>>>>
>>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>>
>>>>
>>>>
>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> I'm having class loading issues that go away when I revert the changes
>>>>> in our use of Class.forName added in
>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>>> isolated class loader to load our library. Things work if we call
>>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>>> no argument] but not if we use the thread context class loader
>>>>> [Thread.currentThread().getContextClassLoader() or
>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>>> not the right thing to do? How can I fix this problem?
>>>>>
>>>>> See this integration test for an example:
>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>>
>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>>
>>>>> Andrew
>>>>>
>>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Kenneth Knowles <kl...@google.com>.
Perhaps we can also adopt a practice of making our own APIs explicitly pass
a Classloader when appropriate so we only have to set this when we are
entering code that does not have good hygiene. It might actually be nice to
have a lightweight static analysis to forbid bad methods in our code.

Kenn

On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote:

> I totally agree, but there are so many Java APIs (including ours) that
> messed this up so everyone lives with the same hack.
>
> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> It seems like a terribly fragile way to pass arguments but my tests pass
>> when I wrap the JDBC path into Beam pipeline execution with that pattern.
>>
>> Thanks!
>>
>> Andrew
>>
>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> It is a common mistake for APIs to not include a way to specify which
>>> class loader to use when doing something like deserializing an instance of
>>> a class via the ObjectInputStream. This common issue also affects Apache
>>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>>> typical Java APIs have gotten around this is to use to the thread context
>>> class loader (TCCL) as the way to plumb this additional attribute through.
>>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>>> as most Java libraries (not all) do the same hack.
>>>
>>> In most environments the TCCL is not set and we are working with a
>>> single class loader. It turns out that in more complicated environments
>>> (like when loading a JDBC driver, or JNDI, or an application server, ...)
>>> this usually doesn't work without each caller knowing what class loading
>>> context they should be in. A common work around for most scenarios is to
>>> always set the TCCL to the current classes class loader like so before
>>> invoking any APIs that do class loading so you don't propagate the TCCL of
>>> the caller along since they may have set it for some other reason:
>>>
>>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>>
>>>
>>>
>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> I'm having class loading issues that go away when I revert the changes
>>>> in our use of Class.forName added in
>>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>>> isolated class loader to load our library. Things work if we call
>>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>>> no argument] but not if we use the thread context class loader
>>>> [Thread.currentThread().getContextClassLoader() or
>>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>>> not the right thing to do? How can I fix this problem?
>>>>
>>>> See this integration test for an example:
>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>>
>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>>
>>>> Andrew
>>>>
>>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Lukasz Cwik <lc...@google.com>.
I totally agree, but there are so many Java APIs (including ours) that
messed this up so everyone lives with the same hack.

On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <ap...@google.com> wrote:

> It seems like a terribly fragile way to pass arguments but my tests pass
> when I wrap the JDBC path into Beam pipeline execution with that pattern.
>
> Thanks!
>
> Andrew
>
> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> It is a common mistake for APIs to not include a way to specify which
>> class loader to use when doing something like deserializing an instance of
>> a class via the ObjectInputStream. This common issue also affects Apache
>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>> typical Java APIs have gotten around this is to use to the thread context
>> class loader (TCCL) as the way to plumb this additional attribute through.
>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>> as most Java libraries (not all) do the same hack.
>>
>> In most environments the TCCL is not set and we are working with a single
>> class loader. It turns out that in more complicated environments (like when
>> loading a JDBC driver, or JNDI, or an application server, ...) this usually
>> doesn't work without each caller knowing what class loading context they
>> should be in. A common work around for most scenarios is to always set the
>> TCCL to the current classes class loader like so before invoking any APIs
>> that do class loading so you don't propagate the TCCL of the caller along
>> since they may have set it for some other reason:
>>
>> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>>     // call some API that uses reflection without taking ClassLoader param} finally {
>>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>
>>
>>
>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> I'm having class loading issues that go away when I revert the changes
>>> in our use of Class.forName added in
>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>> isolated class loader to load our library. Things work if we call
>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>> no argument] but not if we use the thread context class loader
>>> [Thread.currentThread().getContextClassLoader() or
>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>> not the right thing to do? How can I fix this problem?
>>>
>>> See this integration test for an example:
>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>
>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>
>>> Andrew
>>>
>>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Andrew Pilloud <ap...@google.com>.
It seems like a terribly fragile way to pass arguments but my tests pass
when I wrap the JDBC path into Beam pipeline execution with that pattern.

Thanks!

Andrew

On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote:

> It is a common mistake for APIs to not include a way to specify which
> class loader to use when doing something like deserializing an instance of
> a class via the ObjectInputStream. This common issue also affects Apache
> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
> typical Java APIs have gotten around this is to use to the thread context
> class loader (TCCL) as the way to plumb this additional attribute through.
> So Apache Beam is meant to in all places honor the TCCL if it has been set
> as most Java libraries (not all) do the same hack.
>
> In most environments the TCCL is not set and we are working with a single
> class loader. It turns out that in more complicated environments (like when
> loading a JDBC driver, or JNDI, or an application server, ...) this usually
> doesn't work without each caller knowing what class loading context they
> should be in. A common work around for most scenarios is to always set the
> TCCL to the current classes class loader like so before invoking any APIs
> that do class loading so you don't propagate the TCCL of the caller along
> since they may have set it for some other reason:
>
> ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();try {
>     Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>     // call some API that uses reflection without taking ClassLoader param} finally {
>     Thread.currentThread().setContextClassLoader(originalClassLoader);}
>
>
>
> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> I'm having class loading issues that go away when I revert the changes in
>> our use of Class.forName added in
>> https://github.com/apache/beam/pull/4674. The problem I'm having is that
>> the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an isolated
>> class loader to load our library. Things work if we call Class.forName with
>> the default class loader [getClass().getClassLoader() or no argument] but
>> not if we use the thread context class loader
>> [Thread.currentThread().getContextClassLoader() or
>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>> not the right thing to do? How can I fix this problem?
>>
>> See this integration test for an example:
>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>
>> Andrew
>>
>

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

Posted by Lukasz Cwik <lc...@google.com>.
It is a common mistake for APIs to not include a way to specify which class
loader to use when doing something like deserializing an instance of a
class via the ObjectInputStream. This common issue also affects Apache Beam
(SerializableCoder, PipelineOptionsFactory, ...) and the way that typical
Java APIs have gotten around this is to use to the thread context class
loader (TCCL) as the way to plumb this additional attribute through. So
Apache Beam is meant to in all places honor the TCCL if it has been set as
most Java libraries (not all) do the same hack.

In most environments the TCCL is not set and we are working with a single
class loader. It turns out that in more complicated environments (like when
loading a JDBC driver, or JNDI, or an application server, ...) this usually
doesn't work without each caller knowing what class loading context they
should be in. A common work around for most scenarios is to always set the
TCCL to the current classes class loader like so before invoking any APIs
that do class loading so you don't propagate the TCCL of the caller along
since they may have set it for some other reason:

ClassLoader originalClassLoader =
Thread.currentThread().getContextClassLoader();try {
    Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
    // call some API that uses reflection without taking ClassLoader
param} finally {
    Thread.currentThread().setContextClassLoader(originalClassLoader);}



On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <ap...@google.com> wrote:

> I'm having class loading issues that go away when I revert the changes in
> our use of Class.forName added in https://github.com/apache/beam/pull/4674.
> The problem I'm having is that the typical JDBC GUI
> (SqlWorkbench/J, SQuirreL SQL) creates an isolated class loader to load our
> library. Things work if we call Class.forName with the default class loader
> [getClass().getClassLoader() or no argument] but not if we use the thread
> context class loader [Thread.currentThread().getContextClassLoader() or
> ReflectHelpers.findClassLoader()]. Why is using the default class loader
> not the right thing to do? How can I fix this problem?
>
> See this integration test for an example:
> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>
> Andrew
>