You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Anthony Baker <ab...@pivotal.io> on 2015/08/13 19:57:14 UTC

Dynamic classloading in Geode

Vito and I spent some time hacking up a prototype for dynamic and distributed classloading of Geode functions.  Currently a user has to compile a function into a jar and deploy it using gfsh before it can be executed.  If we could enable automatic deployment of functions across a running cluster it would speed up the development cycle for Geode applications and pave the way for other interesting features (like Java8 lambdas).

Here’s how it works:

A function wrapper (DynamicFunction) serializes the original function object and captures dependent classes as byte arrays.  We generate an MD5 hash over the bytecode and use that as the key for storing the bytecode in a replicated region (“hackday”) within the cache.  When the function is invoked, we call putIfAbsent() to distribute the byte code prior to executing the function across the cluster.  During execution, we extend the TCCL with a new class loader that loads classes from our region while the original function is being deserialized.  The original function is then executed in parallel on the cluster members.  This allows an application developer to iteratively modify and test function code without any manual steps to upload class files.

Obviously, there is a lot more thinking and design work to do around these ideas.  Here’s our super-hacky code if you’re interested:
https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1>

Caveats:

1) Currently we only capture static class dependencies.  Any class dependencies present during method invocations are ignored.  This could be addressed by doing byte code inspection (using ASM, javaassist, etc).

2) The region we use to cache class byte code should be automatically recreated as a metadata region, similar to how we store pdx types.  We also need to configure eviction and expiration attributes to control resource usage and remove garbage.

3) We only injected the byte code caching hack into the code path for FunctionService.onServers(pool).  Also, the putIfAbsent() call adds another network roundtrip.


Anthony & Vito

Re: Dynamic classloading in Geode

Posted by Tushar Khairnar <tk...@pivotal.io>.
Is this going to be implementation for "deploy jars". If yes then GEODE-17
(integrated security) will subject it for authorization scrutiny.

On Fri, Aug 14, 2015 at 1:50 AM, Anthony Baker <ab...@pivotal.io> wrote:

> Thanks for the suggestions Mike.  At this point we are just exploring
> ideas and putting them out for discussion.  Regarding restricting access to
> this feature, we used the Geode Java client so standard security and
> authorizations would apply.
>
> Anthony
>
> > On Aug 13, 2015, at 12:24 PM, Michael Stolz <ms...@pivotal.io> wrote:
> >
> > If this feature makes it into an actual release please make sure this
> > option is not enabled by default and is securely turned off for
> > environments where there are strong controls around releasing software
> into
> > production.
> > Also make sure that it is secured in terms of Authentication and
> > Authorization via the Geode security framework when it is enabled, so
> that
> > not just anyone can push code.
> >
> > --
> > Mike Stolz
> > Principal Technical Account Manager
> > Mobile: 631-835-4771
> >
> > On Thu, Aug 13, 2015 at 1:57 PM, Anthony Baker <abaker@pivotal.io
> <ma...@pivotal.io>> wrote:
> >
> >> Vito and I spent some time hacking up a prototype for dynamic and
> >> distributed classloading of Geode functions.  Currently a user has to
> >> compile a function into a jar and deploy it using gfsh before it can be
> >> executed.  If we could enable automatic deployment of functions across a
> >> running cluster it would speed up the development cycle for Geode
> >> applications and pave the way for other interesting features (like Java8
> >> lambdas).
> >>
> >> Here’s how it works:
> >>
> >> A function wrapper (DynamicFunction) serializes the original function
> >> object and captures dependent classes as byte arrays.  We generate an
> MD5
> >> hash over the bytecode and use that as the key for storing the bytecode
> in
> >> a replicated region (“hackday”) within the cache.  When the function is
> >> invoked, we call putIfAbsent() to distribute the byte code prior to
> >> executing the function across the cluster.  During execution, we extend
> the
> >> TCCL with a new class loader that loads classes from our region while
> the
> >> original function is being deserialized.  The original function is then
> >> executed in parallel on the cluster members.  This allows an application
> >> developer to iteratively modify and test function code without any
> manual
> >> steps to upload class files.
> >>
> >> Obviously, there is a lot more thinking and design work to do around
> these
> >> ideas.  Here’s our super-hacky code if you’re interested:
> >> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
> >> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1>>
> >>
> >> Caveats:
> >>
> >> 1) Currently we only capture static class dependencies.  Any class
> >> dependencies present during method invocations are ignored.  This could
> be
> >> addressed by doing byte code inspection (using ASM, javaassist, etc).
> >>
> >> 2) The region we use to cache class byte code should be automatically
> >> recreated as a metadata region, similar to how we store pdx types.  We
> also
> >> need to configure eviction and expiration attributes to control resource
> >> usage and remove garbage.
> >>
> >> 3) We only injected the byte code caching hack into the code path for
> >> FunctionService.onServers(pool).  Also, the putIfAbsent() call adds
> another
> >> network roundtrip.
> >>
> >>
> >> Anthony & Vito
>
>

Re: Dynamic classloading in Geode

Posted by Yogesh Mahajan <ym...@pivotal.io>.
A Good hack Anthony and Vito. Short and Simple.


On Fri, Aug 14, 2015 at 1:50 AM, Anthony Baker <ab...@pivotal.io> wrote:

> Thanks for the suggestions Mike.  At this point we are just exploring
> ideas and putting them out for discussion.  Regarding restricting access to
> this feature, we used the Geode Java client so standard security and
> authorizations would apply.
>
> Anthony
>
> > On Aug 13, 2015, at 12:24 PM, Michael Stolz <ms...@pivotal.io> wrote:
> >
> > If this feature makes it into an actual release please make sure this
> > option is not enabled by default and is securely turned off for
> > environments where there are strong controls around releasing software
> into
> > production.
> > Also make sure that it is secured in terms of Authentication and
> > Authorization via the Geode security framework when it is enabled, so
> that
> > not just anyone can push code.
> >
> > --
> > Mike Stolz
> > Principal Technical Account Manager
> > Mobile: 631-835-4771
> >
> > On Thu, Aug 13, 2015 at 1:57 PM, Anthony Baker <abaker@pivotal.io
> <ma...@pivotal.io>> wrote:
> >
> >> Vito and I spent some time hacking up a prototype for dynamic and
> >> distributed classloading of Geode functions.  Currently a user has to
> >> compile a function into a jar and deploy it using gfsh before it can be
> >> executed.  If we could enable automatic deployment of functions across a
> >> running cluster it would speed up the development cycle for Geode
> >> applications and pave the way for other interesting features (like Java8
> >> lambdas).
> >>
> >> Here’s how it works:
> >>
> >> A function wrapper (DynamicFunction) serializes the original function
> >> object and captures dependent classes as byte arrays.  We generate an
> MD5
> >> hash over the bytecode and use that as the key for storing the bytecode
> in
> >> a replicated region (“hackday”) within the cache.  When the function is
> >> invoked, we call putIfAbsent() to distribute the byte code prior to
> >> executing the function across the cluster.  During execution, we extend
> the
> >> TCCL with a new class loader that loads classes from our region while
> the
> >> original function is being deserialized.  The original function is then
> >> executed in parallel on the cluster members.  This allows an application
> >> developer to iteratively modify and test function code without any
> manual
> >> steps to upload class files.
> >>
> >> Obviously, there is a lot more thinking and design work to do around
> these
> >> ideas.  Here’s our super-hacky code if you’re interested:
> >> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
> >> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1>>
> >>
> >> Caveats:
> >>
> >> 1) Currently we only capture static class dependencies.  Any class
> >> dependencies present during method invocations are ignored.  This could
> be
> >> addressed by doing byte code inspection (using ASM, javaassist, etc).
> >>
> >> 2) The region we use to cache class byte code should be automatically
> >> recreated as a metadata region, similar to how we store pdx types.  We
> also
> >> need to configure eviction and expiration attributes to control resource
> >> usage and remove garbage.
> >>
> >> 3) We only injected the byte code caching hack into the code path for
> >> FunctionService.onServers(pool).  Also, the putIfAbsent() call adds
> another
> >> network roundtrip.
> >>
> >>
> >> Anthony & Vito
>
>

Re: Dynamic classloading in Geode

Posted by Anthony Baker <ab...@pivotal.io>.
Thanks for the suggestions Mike.  At this point we are just exploring ideas and putting them out for discussion.  Regarding restricting access to this feature, we used the Geode Java client so standard security and authorizations would apply.

Anthony

> On Aug 13, 2015, at 12:24 PM, Michael Stolz <ms...@pivotal.io> wrote:
> 
> If this feature makes it into an actual release please make sure this
> option is not enabled by default and is securely turned off for
> environments where there are strong controls around releasing software into
> production.
> Also make sure that it is secured in terms of Authentication and
> Authorization via the Geode security framework when it is enabled, so that
> not just anyone can push code.
> 
> --
> Mike Stolz
> Principal Technical Account Manager
> Mobile: 631-835-4771
> 
> On Thu, Aug 13, 2015 at 1:57 PM, Anthony Baker <abaker@pivotal.io <ma...@pivotal.io>> wrote:
> 
>> Vito and I spent some time hacking up a prototype for dynamic and
>> distributed classloading of Geode functions.  Currently a user has to
>> compile a function into a jar and deploy it using gfsh before it can be
>> executed.  If we could enable automatic deployment of functions across a
>> running cluster it would speed up the development cycle for Geode
>> applications and pave the way for other interesting features (like Java8
>> lambdas).
>> 
>> Here’s how it works:
>> 
>> A function wrapper (DynamicFunction) serializes the original function
>> object and captures dependent classes as byte arrays.  We generate an MD5
>> hash over the bytecode and use that as the key for storing the bytecode in
>> a replicated region (“hackday”) within the cache.  When the function is
>> invoked, we call putIfAbsent() to distribute the byte code prior to
>> executing the function across the cluster.  During execution, we extend the
>> TCCL with a new class loader that loads classes from our region while the
>> original function is being deserialized.  The original function is then
>> executed in parallel on the cluster members.  This allows an application
>> developer to iteratively modify and test function code without any manual
>> steps to upload class files.
>> 
>> Obviously, there is a lot more thinking and design work to do around these
>> ideas.  Here’s our super-hacky code if you’re interested:
>> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
>> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1>>
>> 
>> Caveats:
>> 
>> 1) Currently we only capture static class dependencies.  Any class
>> dependencies present during method invocations are ignored.  This could be
>> addressed by doing byte code inspection (using ASM, javaassist, etc).
>> 
>> 2) The region we use to cache class byte code should be automatically
>> recreated as a metadata region, similar to how we store pdx types.  We also
>> need to configure eviction and expiration attributes to control resource
>> usage and remove garbage.
>> 
>> 3) We only injected the byte code caching hack into the code path for
>> FunctionService.onServers(pool).  Also, the putIfAbsent() call adds another
>> network roundtrip.
>> 
>> 
>> Anthony & Vito


Re: Dynamic classloading in Geode

Posted by Michael Stolz <ms...@pivotal.io>.
If this feature makes it into an actual release please make sure this
option is not enabled by default and is securely turned off for
environments where there are strong controls around releasing software into
production.
Also make sure that it is secured in terms of Authentication and
Authorization via the Geode security framework when it is enabled, so that
not just anyone can push code.

--
Mike Stolz
Principal Technical Account Manager
Mobile: 631-835-4771

On Thu, Aug 13, 2015 at 1:57 PM, Anthony Baker <ab...@pivotal.io> wrote:

> Vito and I spent some time hacking up a prototype for dynamic and
> distributed classloading of Geode functions.  Currently a user has to
> compile a function into a jar and deploy it using gfsh before it can be
> executed.  If we could enable automatic deployment of functions across a
> running cluster it would speed up the development cycle for Geode
> applications and pave the way for other interesting features (like Java8
> lambdas).
>
> Here’s how it works:
>
> A function wrapper (DynamicFunction) serializes the original function
> object and captures dependent classes as byte arrays.  We generate an MD5
> hash over the bytecode and use that as the key for storing the bytecode in
> a replicated region (“hackday”) within the cache.  When the function is
> invoked, we call putIfAbsent() to distribute the byte code prior to
> executing the function across the cluster.  During execution, we extend the
> TCCL with a new class loader that loads classes from our region while the
> original function is being deserialized.  The original function is then
> executed in parallel on the cluster members.  This allows an application
> developer to iteratively modify and test function code without any manual
> steps to upload class files.
>
> Obviously, there is a lot more thinking and design work to do around these
> ideas.  Here’s our super-hacky code if you’re interested:
> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1 <
> https://gist.github.com/metatype/9b1f39a24e52f5c6f3e1>
>
> Caveats:
>
> 1) Currently we only capture static class dependencies.  Any class
> dependencies present during method invocations are ignored.  This could be
> addressed by doing byte code inspection (using ASM, javaassist, etc).
>
> 2) The region we use to cache class byte code should be automatically
> recreated as a metadata region, similar to how we store pdx types.  We also
> need to configure eviction and expiration attributes to control resource
> usage and remove garbage.
>
> 3) We only injected the byte code caching hack into the code path for
> FunctionService.onServers(pool).  Also, the putIfAbsent() call adds another
> network roundtrip.
>
>
> Anthony & Vito