You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by John Burwell <jb...@basho.com> on 2013/08/20 23:43:17 UTC

[DISCUSS/PROPOSAL] Upgrading Driver Model

All,

In capturing my thoughts on storage, my thinking backed into the driver model. While we have the beginnings of such a model today, I see the following deficiencies:

Multiple Models: The Storage, Hypervisor, and Security layers each have a slightly different model for allowing system functionality to be extended/substituted. These differences increase the barrier of entry for vendors seeking to extend CloudStack and accrete code paths to be maintained and verified.
Leaky Abstraction: Plugins are registered through a Spring configuration file. In addition to being operator unfriendly (most sysadmins are not Spring experts nor do they want to be), we expose the core bootstrapping mechanism to operators. Therefore, a misconfiguration could negatively impact the injection/configuration of internal management server components. Essentially handing them a loaded shotgun pointed at our right foot.
Nondeterministic Load/Unload Model: Because the core loading mechanism is Spring, the management has little control over the timing and order of component loading/unloading. Changes to the Management Server's component dependency graph could break a driver by causing it to be started at an unexpected time.
Lack of Execution Isolation: As a Spring component, plugins are loaded into the same execution context as core management server components. Therefore, an errant plugin can corrupt the entire management server.

For next revision of the plugin/driver mechanism, I would like see us migrate towards a standard pluggable driver model that supports all of the management server's extension points (e.g. network devices, storage devices, hypervisors, etc) with the following capabilities:

Consolidated Lifecycle and Startup Procedure: Drivers share a common state machine and categorization (e.g. network, storage, hypervisor, etc) that permits the deterministic calculation of initialization and destruction order (i.e. network layer drivers -> storage layer drivers -> hypervisor drivers). Plugin inter-dependencies would be supported between plugins sharing the same category.
In-process Installation and Upgrade: Adding or upgrading a driver does not require the management server to be restarted. This capability implies a system that supports the simultaneous execution of multiple driver versions and the ability to suspend continued execution work on a resource while the underlying driver instance is replaced.
Execution Isolation: The deployment packaging and execution environment supports different (and potentially conflicting) versions of dependencies to be simultaneously used. Additionally, plugins would be sufficiently sandboxed to protect the management server against driver instability.
Extension Data Model: Drivers provide a property bag with a metadata descriptor to validate and render vendor specific data. The contents of this property bag will provided to every driver operation invocation at runtime. The metadata descriptor would be a lightweight description that provides a label resource key, a description resource key, data type (string, date, number, boolean), required flag, and optional length limit.
Introspection: Administrative APIs/UIs allow operators to understand the configuration of the drivers in the system, their configuration, and their current state.
Discoverability: Optionally, drivers can be discovered via a project repository definition (similar to Yum) allowing drivers to be remotely acquired and operators to be notified regarding update availability. The project would also provide, free of charge, certificates to sign plugins. This mechanism would support local mirroring to support air gapped management networks.

Fundamentally, I do not want to turn CloudStack into an erector set with more screws than nuts which is a risk with highly pluggable architectures. As such, I think we would need to tightly bound the scope of drivers and their behaviors to prevent the loss system usability and stability. My thinking is that drivers would be packaged into a custom JAR, CAR (CloudStack ARchive), that would be structured as followed:

META-INF
MANIFEST.MF
driver.yaml (driver metadata(e.g. version, name, description, etc) serialized in YAML format)
LICENSE (a text file containing the driver's license)
lib (driver dependencies)
classes (driver implementation)
resources (driver message files and potentially JS resources)

The management server would acquire drivers through a simple scan of a URL (e.g. file directory, S3 bucket, etc). For every CAR object found, the management server would create an execution environment (likely a dedicated ExecutorService and Classloader), and transition the state of the driver to Running (the exact state model would need to be worked out). To be really nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to create CARs. I can also imagine an opportunities to add hooks to this model to register instrumentation information with JMX and authorization.

To keep the scope of this email confined, we would introduce the general notion of a Resource, and (hand wave hand wave) eventually compartmentalize the execution of work around a resource [1]. This (hand waved) compartmentalization would allow us the controls necessary to safely and reliably perform in-place driver upgrades. For an initial release, I would recommend implementing the abstractions, loading mechanism, extension data model, and discovery features. With these capabilities in place, we could attack the in-place upgrade model.

If we were to adopt such a pluggable capability, we would have the opportunity to decouple the vendor and CloudStack release schedules. For example, if a vendor were introducing a new product that required a new or updated driver, they would no longer need to wait for a CloudStack release to support it. They would also gain the ability to fix high priority defects in the same manner.

I have hand waved a number of issues that would need to be resolved before such an approach could be implemented. However, I think we need to decide, as a community, that it worth devoting energy and effort to enhancing the plugin/driver model and the goals of that effort before driving head first into the deep rabbit hole of design/implementation.

Thoughts? (/me ducks)
-John

[1]: My opinions on the matter from CloudStack Collab 2013 -> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Prasanna,

Generally, Spring configuration files should be packaged in their associated JARs with property substitution for configurable items (e.g. connection pool min and max sizes).  Unfortunately, Spring does not allow component wiring to be modified through property files.  Since plugins are new components, we have to expose the underlying Spring configuration files to allow plugins to be loaded.  I think our current approach was a solid pragmatic step forward -- a nice midpoint between nothing and a complete driver/plugin model.

Thanks,
-John

On Aug 21, 2013, at 9:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:

> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>> Leaky Abstraction:  Plugins are registered through a Spring
>> configuration file.  In addition to being operator unfriendly (most
>> sysadmins are not Spring experts nor do they want to be), we expose
>> the core bootstrapping mechanism to operators.  Therefore, a
>> misconfiguration could negatively impact the injection/configuration
>> of internal management server components.  Essentially handing them
>> a loaded shotgun pointed at our right foot.
> 
> This has been my pet-peeve too and I was told you can write properties files
> above the spring contexts to make it simpler for operators to look at.
> 
> Overall a great proposal and look forward to see more concrete steps
> that follow on the implementation details.
> 
> -- 
> Prasanna.,
> 
> ------------------------
> Powered by BigRock.com
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Daan,

I think I mentioned in my proposal to defer hot loading/unloading to a later release.  It is a hard issue, and not required to address the current pain points.

Thanks,
-John

On Aug 25, 2013, at 7:43 AM, Daan Hoogland <da...@gmail.com> wrote:

> It seems I am the only one not sharing your reservations regarding
> OSGi, so let's go for it, John.
> 
> I would personally  try to not bother with the hot-loading and
> -unloading of drivers and create a install and a drivers directory for
> all running processes, where these will be checked upon starting to
> update or install any new stuff. If a real life-cycle management is
> needed on run-time I would once again urge to go with OSGi.
> 
> I would love to help on this not withstanding any objection I have on
> the way to go. It seems like fun to implement:)
> Daan
> 
> On Fri, Aug 23, 2013 at 1:44 AM, Kelven Yang <ke...@citrix.com> wrote:
>> Spring is not meant to be used as a solution for run-time "plug-ins".
>> Darren is correct that Spring XML should be treated as code (ideal place
>> for it is the resource section inside the jar). Why we end up the way now
>> is mainly for practical reason. Since most of our current pluggable
>> features are not yet designed to be fully run-time loadable, most of them
>> have compile time linkage to other framework components that are solved at
>> loading time by Spring.
>> 
>> Only after we have cleaned up all these tightly coupled loading time
>> bindings, can we have a much simpler plugin configuration. And this
>> run-time loadable framework does not necessary to be based on any complex
>> ones (i.e., OSGi).
>> 
>> Kelven
>> 
>> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com> wrote:
>> 
>>> I also agree with this.  Spring XML should always be treated as code not
>>> really configuration.  It's not good to have a sysadmin touch spring
>>> config and frankly it's just mean to force them to.
>>> 
>>> I would ideally like to see that registering a module is as simple as
>>> putting a jar in a directory.  If its in the directory it gets loaded.
>>> Then additionally you should have a way such that you can explicitly tell
>>> it not to load modules based on some configuration.  That way, if for
>>> some reason moving the jar is not possible, you can still disallow it.
>>> 
>>> So for example the directory based approach works well with rpm/deb's so
>>> "yum install mycoolplugin" will just place jar somewhere.  But say your
>>> troubleshooting or whatever, you don't really want to have to do "yum
>>> remove..." just to troubleshoot.  It would be nice to just edit some file
>>> and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>>> 
>>> Darren
>>> 
>>> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>>> 
>>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>>> misconfiguration could negatively impact the injection/configuration
>>>>> of internal management server components.  Essentially handing them
>>>>> a loaded shotgun pointed at our right foot.
>>>> 
>>>> This has been my pet-peeve too and I was told you can write properties
>>>> files
>>>> above the spring contexts to make it simpler for operators to look at.
>>>> 
>>>> Overall a great proposal and look forward to see more concrete steps
>>>> that follow on the implementation details.
>>>> 
>>>> --
>>>> Prasanna.,
>>>> 
>>>> ------------------------
>>>> Powered by BigRock.com
>>>> 
>>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Daan Hoogland <da...@gmail.com>.

It seems I am the only one not sharing your reservations regarding
OSGi, so let's go for it, John.

I would personally  try to not bother with the hot-loading and
-unloading of drivers and create a install and a drivers directory for
all running processes, where these will be checked upon starting to
update or install any new stuff. If a real life-cycle management is
needed on run-time I would once again urge to go with OSGi.

I would love to help on this not withstanding any objection I have on
the way to go. It seems like fun to implement:)
Daan

On Fri, Aug 23, 2013 at 1:44 AM, Kelven Yang <ke...@citrix.com> wrote:
> Spring is not meant to be used as a solution for run-time "plug-ins".
> Darren is correct that Spring XML should be treated as code (ideal place
> for it is the resource section inside the jar). Why we end up the way now
> is mainly for practical reason. Since most of our current pluggable
> features are not yet designed to be fully run-time loadable, most of them
> have compile time linkage to other framework components that are solved at
> loading time by Spring.
>
> Only after we have cleaned up all these tightly coupled loading time
> bindings, can we have a much simpler plugin configuration. And this
> run-time loadable framework does not necessary to be based on any complex
> ones (i.e., OSGi).
>
> Kelven
>
> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com> wrote:
>
>>I also agree with this.  Spring XML should always be treated as code not
>>really configuration.  It's not good to have a sysadmin touch spring
>>config and frankly it's just mean to force them to.
>>
>>I would ideally like to see that registering a module is as simple as
>>putting a jar in a directory.  If its in the directory it gets loaded.
>>Then additionally you should have a way such that you can explicitly tell
>>it not to load modules based on some configuration.  That way, if for
>>some reason moving the jar is not possible, you can still disallow it.
>>
>>So for example the directory based approach works well with rpm/deb's so
>>"yum install mycoolplugin" will just place jar somewhere.  But say your
>>troubleshooting or whatever, you don't really want to have to do "yum
>>remove..." just to troubleshoot.  It would be nice to just edit some file
>>and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>>
>>Darren
>>
>>On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>>
>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>> configuration file.  In addition to being operator unfriendly (most
>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>> misconfiguration could negatively impact the injection/configuration
>>>> of internal management server components.  Essentially handing them
>>>> a loaded shotgun pointed at our right foot.
>>>
>>> This has been my pet-peeve too and I was told you can write properties
>>>files
>>> above the spring contexts to make it simpler for operators to look at.
>>>
>>> Overall a great proposal and look forward to see more concrete steps
>>> that follow on the implementation details.
>>>
>>> --
>>> Prasanna.,
>>>
>>> ------------------------
>>> Powered by BigRock.com
>>>
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Marcus Sorensen <sh...@gmail.com>.

There has been much discussion recently on the board about
refactoring/rearchitecting.  There are some great ideas being thrown
around, but I would like to see a focus on testing prior to any of
this work, so we can see what breaks after the major changes. It would
be nice if we could dedicate a whole release to testing and bugfixing,
I feel like the testing coverage is never going to catch up at the
current rate. As a bonus, such a release might actually hit it's
release goal date.

On Mon, Aug 26, 2013 at 2:55 PM, Darren Shepherd
<da...@gmail.com> wrote:
> John,
>
> I mentioned before I'd been thinking up some ideas that go along with some
> of the things you've proposed here.  I'm working through a lot of different
> ideas right now, but I've thrown up some notes in a totally random part of
> the CloudStack wiki.  Take a look, this is a complete work in progress, but
> you can maybe get an idea of where my head is going (complete with really
> bad grammar and typos).
>
> Wiki:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Nothing+to+see+here...#Nothingtoseehere...-ModuleSystem
>
> Some code too:  https://github.com/ibuildthecloud/cloudstack-modules
>
> This is based off of stuff I've done in the past.  So nothing is too hand
> wavy.
>
> Darren

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Darren Shepherd <da...@gmail.com>.

John,

I mentioned before I'd been thinking up some ideas that go along with 
some of the things you've proposed here.  I'm working through a lot of 
different ideas right now, but I've thrown up some notes in a totally 
random part of the CloudStack wiki.  Take a look, this is a complete 
work in progress, but you can maybe get an idea of where my head is 
going (complete with really bad grammar and typos).

Wiki: 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Nothing+to+see+here...#Nothingtoseehere...-ModuleSystem

Some code too:  https://github.com/ibuildthecloud/cloudstack-modules

This is based off of stuff I've done in the past.  So nothing is too 
hand wavy.

Darren

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Darren,

Please see my responses in-line below.

Thanks,
-John

On Aug 21, 2013, at 11:42 AM, Darren Shepherd <da...@gmail.com> wrote:

> I also agree with this.  Spring XML should always be treated as code not really configuration.  It's not good to have a sysadmin touch spring config and frankly it's just mean to force them to.

+1.  I will take it a step further, with Spring 3, I don't even want to see a Spring configuration file.  The @Configuration facility allows all wiring to be programatic with no Spring dependencies in actual domain objects or service code (previous rant on this subject [1]).

[1]: http://markmail.org/thread/2b2egdruxvcognsz

> 
> I would ideally like to see that registering a module is as simple as putting a jar in a directory.  If its in the directory it gets loaded.  Then additionally you should have a way such that you can explicitly tell it not to load modules based on some configuration.  That way, if for some reason moving the jar is not possible, you can still disallow it.

Large agree (as I laid in my original proposal).  However, I would like to extend the can to a URL not just a filesystem path.  In a clustered environment, operators may want to put their drivers in a S3/Swift bucket or simple deploy them as static assets on an HTTP server.  Generally, we need to break CloudStack of the assumption that everything is stored in a filesystem.  I don't see a need to complicate the mechanism with an exclusion list.  If the file is present, it will be used.  

I also believe that we will need our own archive format to support the deployment of additional capabilities such as UI plugins to actually configure/control a plugin, provide internalization resources, and bundle up dependencies.  Finally, by default, CloudStack should only accept signed components.  We can provide a configuration option to disable this requirement, but I would like to see such a mechanism start on the proper security footing by requiring it by default.

> 
> So for example the directory based approach works well with rpm/deb's so "yum install mycoolplugin" will just place jar somewhere.  But say your troubleshooting or whatever, you don't really want to have to do "yum remove..." just to troubleshoot.  It would be nice to just edit some file and say "plugin.mycoolplugin.load=false" (or env variable or whatever)

I agree regarding the repository model.  I would like a simple, decentralized repository mechanism such as Yum (apt is more powerful but also more difficult to configure).  Vendors publish their repositories and operators point to them.  We could make the discovery of vendor repositories a little easier by putting the repository definition for each GPG key issued to vendors.  As a project, we don't want to get near driver distribution.  We only want to define a common repository structure, and possibly, provide pointers to vendor repos.

> 
> Darren
> 
> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
> 
>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>> Leaky Abstraction:  Plugins are registered through a Spring
>>> configuration file.  In addition to being operator unfriendly (most
>>> sysadmins are not Spring experts nor do they want to be), we expose
>>> the core bootstrapping mechanism to operators.  Therefore, a
>>> misconfiguration could negatively impact the injection/configuration
>>> of internal management server components.  Essentially handing them
>>> a loaded shotgun pointed at our right foot.
>> 
>> This has been my pet-peeve too and I was told you can write properties files
>> above the spring contexts to make it simpler for operators to look at.
>> 
>> Overall a great proposal and look forward to see more concrete steps
>> that follow on the implementation details.
>> 
>> -- 
>> Prasanna.,
>> 
>> ------------------------
>> Powered by BigRock.com
>>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by "SuichII, Christopher" <Ch...@netapp.com>.

Interesting. I'm not sure how I missed this thread... I'll try to chime in where I can, then. However, everything going on in here sounds like work for post-4.3, but if we are adding revert volume snapshot to 4.3, we will need a solution to that before then. It seems like the idea I've got for this is fairly lightweight and could be either extended or removed depending on what comes out of that discussion. If there are other ideas, I'm more than happy to continue discussions.

-Chris
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 9, 2013, at 4:39 PM, John Burwell <jb...@basho.com> wrote:

> Kelven,
> 
> As I stated in my proposal, I think it is important to recognize the distinction between components that control/interact with infrastructure and components that represent orchestration abstractions/mechanisms within the management server.  Currently, these two concepts are conflated -- complicating the effort to modularize the system.  Therefore, in my view, any effort going forward must make this important distinction.
> 
> The first type, in my vocabulary, are device drivers.  In my view, these are essential system extension points that require greater modularity and isolation due to their potential to require conflicting dependencies and their external QA.  My proposal pertains only to these types of components, and I think it important to continue the discussion as it has far reaching implications to both the system architecture and our release process.  IN particular, I think it is possible for us to achieve the ability to have completely segregated device drivers shipped separately from CloudStack releases.  
> 
> Thanks,
> -John
> 
> On Sep 9, 2013, at 1:49 PM, Kelven Yang <ke...@citrix.com> wrote:
> 
>> John,
>> 
>> I understand. The effort we did in 4.1 was mainly to free developers from
>> the needs to work at low-level plumbing layer, prior to 4.1, not every
>> developer knows how to modify ComponentLocator safely, switching to a
>> standard framework can let us focus on Cloud operating business logic.
>> 
>> Breaking CloudStack into a more modularized architecture is a long journey
>> which we are still striving to get there, Daren's work will again bring us
>> one step closer, I think this incremental refactoring approach can help
>> reduce the turbulence during the flight and ensure smoother releases along
>> the way.
>> 
>> kelven
>> 
>> 
>> On 8/25/13 8:35 PM, "John Burwell" <jb...@basho.com> wrote:
>> 
>>> Kelven,
>>> 
>>> Please don't take my proposal as a criticism of the approach taken in
>>> 4.1.  I think the current model is a big improvement over the previous
>>> approach.  Given the time constraints and ambitions of that work, I think
>>> it was a solid, pragmatic first step.  I believe we are at a point to
>>> assess our needs, and determine a good next step that (hopefully) further
>>> improves the model.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Aug 22, 2013, at 7:44 PM, Kelven Yang <ke...@citrix.com> wrote:
>>> 
>>>> Spring is not meant to be used as a solution for run-time "plug-ins".
>>>> Darren is correct that Spring XML should be treated as code (ideal place
>>>> for it is the resource section inside the jar). Why we end up the way
>>>> now
>>>> is mainly for practical reason. Since most of our current pluggable
>>>> features are not yet designed to be fully run-time loadable, most of
>>>> them
>>>> have compile time linkage to other framework components that are solved
>>>> at
>>>> loading time by Spring.
>>>> 
>>>> Only after we have cleaned up all these tightly coupled loading time
>>>> bindings, can we have a much simpler plugin configuration. And this
>>>> run-time loadable framework does not necessary to be based on any
>>>> complex
>>>> ones (i.e., OSGi).
>>>> 
>>>> Kelven 
>>>> 
>>>> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com>
>>>> wrote:
>>>> 
>>>>> I also agree with this.  Spring XML should always be treated as code
>>>>> not
>>>>> really configuration.  It's not good to have a sysadmin touch spring
>>>>> config and frankly it's just mean to force them to.
>>>>> 
>>>>> I would ideally like to see that registering a module is as simple as
>>>>> putting a jar in a directory.  If its in the directory it gets loaded.
>>>>> Then additionally you should have a way such that you can explicitly
>>>>> tell
>>>>> it not to load modules based on some configuration.  That way, if for
>>>>> some reason moving the jar is not possible, you can still disallow it.
>>>>> 
>>>>> So for example the directory based approach works well with rpm/deb's
>>>>> so
>>>>> "yum install mycoolplugin" will just place jar somewhere.  But say your
>>>>> troubleshooting or whatever, you don't really want to have to do "yum
>>>>> remove..." just to troubleshoot.  It would be nice to just edit some
>>>>> file
>>>>> and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>>>>> 
>>>>> Darren
>>>>> 
>>>>> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>>>>> 
>>>>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>>>>> misconfiguration could negatively impact the injection/configuration
>>>>>>> of internal management server components.  Essentially handing them
>>>>>>> a loaded shotgun pointed at our right foot.
>>>>>> 
>>>>>> This has been my pet-peeve too and I was told you can write properties
>>>>>> files
>>>>>> above the spring contexts to make it simpler for operators to look at.
>>>>>> 
>>>>>> Overall a great proposal and look forward to see more concrete steps
>>>>>> that follow on the implementation details.
>>>>>> 
>>>>>> -- 
>>>>>> Prasanna.,
>>>>>> 
>>>>>> ------------------------
>>>>>> Powered by BigRock.com
>>>>>> 
>>>> 
>>> 
>> 
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Kelven,

As I stated in my proposal, I think it is important to recognize the distinction between components that control/interact with infrastructure and components that represent orchestration abstractions/mechanisms within the management server.  Currently, these two concepts are conflated -- complicating the effort to modularize the system.  Therefore, in my view, any effort going forward must make this important distinction.

The first type, in my vocabulary, are device drivers.  In my view, these are essential system extension points that require greater modularity and isolation due to their potential to require conflicting dependencies and their external QA.  My proposal pertains only to these types of components, and I think it important to continue the discussion as it has far reaching implications to both the system architecture and our release process.  IN particular, I think it is possible for us to achieve the ability to have completely segregated device drivers shipped separately from CloudStack releases.  

Thanks,
-John

On Sep 9, 2013, at 1:49 PM, Kelven Yang <ke...@citrix.com> wrote:

> John,
> 
> I understand. The effort we did in 4.1 was mainly to free developers from
> the needs to work at low-level plumbing layer, prior to 4.1, not every
> developer knows how to modify ComponentLocator safely, switching to a
> standard framework can let us focus on Cloud operating business logic.
> 
> Breaking CloudStack into a more modularized architecture is a long journey
> which we are still striving to get there, Daren's work will again bring us
> one step closer, I think this incremental refactoring approach can help
> reduce the turbulence during the flight and ensure smoother releases along
> the way.
> 
> kelven
> 
> 
> On 8/25/13 8:35 PM, "John Burwell" <jb...@basho.com> wrote:
> 
>> Kelven,
>> 
>> Please don't take my proposal as a criticism of the approach taken in
>> 4.1.  I think the current model is a big improvement over the previous
>> approach.  Given the time constraints and ambitions of that work, I think
>> it was a solid, pragmatic first step.  I believe we are at a point to
>> assess our needs, and determine a good next step that (hopefully) further
>> improves the model.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 22, 2013, at 7:44 PM, Kelven Yang <ke...@citrix.com> wrote:
>> 
>>> Spring is not meant to be used as a solution for run-time "plug-ins".
>>> Darren is correct that Spring XML should be treated as code (ideal place
>>> for it is the resource section inside the jar). Why we end up the way
>>> now
>>> is mainly for practical reason. Since most of our current pluggable
>>> features are not yet designed to be fully run-time loadable, most of
>>> them
>>> have compile time linkage to other framework components that are solved
>>> at
>>> loading time by Spring.
>>> 
>>> Only after we have cleaned up all these tightly coupled loading time
>>> bindings, can we have a much simpler plugin configuration. And this
>>> run-time loadable framework does not necessary to be based on any
>>> complex
>>> ones (i.e., OSGi).
>>> 
>>> Kelven 
>>> 
>>> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com>
>>> wrote:
>>> 
>>>> I also agree with this.  Spring XML should always be treated as code
>>>> not
>>>> really configuration.  It's not good to have a sysadmin touch spring
>>>> config and frankly it's just mean to force them to.
>>>> 
>>>> I would ideally like to see that registering a module is as simple as
>>>> putting a jar in a directory.  If its in the directory it gets loaded.
>>>> Then additionally you should have a way such that you can explicitly
>>>> tell
>>>> it not to load modules based on some configuration.  That way, if for
>>>> some reason moving the jar is not possible, you can still disallow it.
>>>> 
>>>> So for example the directory based approach works well with rpm/deb's
>>>> so
>>>> "yum install mycoolplugin" will just place jar somewhere.  But say your
>>>> troubleshooting or whatever, you don't really want to have to do "yum
>>>> remove..." just to troubleshoot.  It would be nice to just edit some
>>>> file
>>>> and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>>>> 
>>>> Darren
>>>> 
>>>> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>>>> 
>>>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>>>> misconfiguration could negatively impact the injection/configuration
>>>>>> of internal management server components.  Essentially handing them
>>>>>> a loaded shotgun pointed at our right foot.
>>>>> 
>>>>> This has been my pet-peeve too and I was told you can write properties
>>>>> files
>>>>> above the spring contexts to make it simpler for operators to look at.
>>>>> 
>>>>> Overall a great proposal and look forward to see more concrete steps
>>>>> that follow on the implementation details.
>>>>> 
>>>>> -- 
>>>>> Prasanna.,
>>>>> 
>>>>> ------------------------
>>>>> Powered by BigRock.com
>>>>> 
>>> 
>> 
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Kelven Yang <ke...@citrix.com>.

John,

I understand. The effort we did in 4.1 was mainly to free developers from
the needs to work at low-level plumbing layer, prior to 4.1, not every
developer knows how to modify ComponentLocator safely, switching to a
standard framework can let us focus on Cloud operating business logic.

Breaking CloudStack into a more modularized architecture is a long journey
which we are still striving to get there, Daren's work will again bring us
one step closer, I think this incremental refactoring approach can help
reduce the turbulence during the flight and ensure smoother releases along
the way.

kelven


On 8/25/13 8:35 PM, "John Burwell" <jb...@basho.com> wrote:

>Kelven,
>
>Please don't take my proposal as a criticism of the approach taken in
>4.1.  I think the current model is a big improvement over the previous
>approach.  Given the time constraints and ambitions of that work, I think
>it was a solid, pragmatic first step.  I believe we are at a point to
>assess our needs, and determine a good next step that (hopefully) further
>improves the model.
>
>Thanks,
>-John
>
>On Aug 22, 2013, at 7:44 PM, Kelven Yang <ke...@citrix.com> wrote:
>
>> Spring is not meant to be used as a solution for run-time "plug-ins".
>> Darren is correct that Spring XML should be treated as code (ideal place
>> for it is the resource section inside the jar). Why we end up the way
>>now
>> is mainly for practical reason. Since most of our current pluggable
>> features are not yet designed to be fully run-time loadable, most of
>>them
>> have compile time linkage to other framework components that are solved
>>at
>> loading time by Spring.
>> 
>> Only after we have cleaned up all these tightly coupled loading time
>> bindings, can we have a much simpler plugin configuration. And this
>> run-time loadable framework does not necessary to be based on any
>>complex
>> ones (i.e., OSGi).
>> 
>> Kelven 
>> 
>> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com>
>>wrote:
>> 
>>> I also agree with this.  Spring XML should always be treated as code
>>>not
>>> really configuration.  It's not good to have a sysadmin touch spring
>>> config and frankly it's just mean to force them to.
>>> 
>>> I would ideally like to see that registering a module is as simple as
>>> putting a jar in a directory.  If its in the directory it gets loaded.
>>> Then additionally you should have a way such that you can explicitly
>>>tell
>>> it not to load modules based on some configuration.  That way, if for
>>> some reason moving the jar is not possible, you can still disallow it.
>>> 
>>> So for example the directory based approach works well with rpm/deb's
>>>so
>>> "yum install mycoolplugin" will just place jar somewhere.  But say your
>>> troubleshooting or whatever, you don't really want to have to do "yum
>>> remove..." just to troubleshoot.  It would be nice to just edit some
>>>file
>>> and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>>> 
>>> Darren
>>> 
>>> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>>> 
>>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>>> misconfiguration could negatively impact the injection/configuration
>>>>> of internal management server components.  Essentially handing them
>>>>> a loaded shotgun pointed at our right foot.
>>>> 
>>>> This has been my pet-peeve too and I was told you can write properties
>>>> files
>>>> above the spring contexts to make it simpler for operators to look at.
>>>> 
>>>> Overall a great proposal and look forward to see more concrete steps
>>>> that follow on the implementation details.
>>>> 
>>>> -- 
>>>> Prasanna.,
>>>> 
>>>> ------------------------
>>>> Powered by BigRock.com
>>>> 
>> 
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Kelven,

Please don't take my proposal as a criticism of the approach taken in 4.1.  I think the current model is a big improvement over the previous approach.  Given the time constraints and ambitions of that work, I think it was a solid, pragmatic first step.  I believe we are at a point to assess our needs, and determine a good next step that (hopefully) further improves the model.

Thanks,
-John

On Aug 22, 2013, at 7:44 PM, Kelven Yang <ke...@citrix.com> wrote:

> Spring is not meant to be used as a solution for run-time "plug-ins".
> Darren is correct that Spring XML should be treated as code (ideal place
> for it is the resource section inside the jar). Why we end up the way now
> is mainly for practical reason. Since most of our current pluggable
> features are not yet designed to be fully run-time loadable, most of them
> have compile time linkage to other framework components that are solved at
> loading time by Spring.
> 
> Only after we have cleaned up all these tightly coupled loading time
> bindings, can we have a much simpler plugin configuration. And this
> run-time loadable framework does not necessary to be based on any complex
> ones (i.e., OSGi).
> 
> Kelven 
> 
> On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com> wrote:
> 
>> I also agree with this.  Spring XML should always be treated as code not
>> really configuration.  It's not good to have a sysadmin touch spring
>> config and frankly it's just mean to force them to.
>> 
>> I would ideally like to see that registering a module is as simple as
>> putting a jar in a directory.  If its in the directory it gets loaded.
>> Then additionally you should have a way such that you can explicitly tell
>> it not to load modules based on some configuration.  That way, if for
>> some reason moving the jar is not possible, you can still disallow it.
>> 
>> So for example the directory based approach works well with rpm/deb's so
>> "yum install mycoolplugin" will just place jar somewhere.  But say your
>> troubleshooting or whatever, you don't really want to have to do "yum
>> remove..." just to troubleshoot.  It would be nice to just edit some file
>> and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>> 
>> Darren
>> 
>> On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>> 
>>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>>> Leaky Abstraction:  Plugins are registered through a Spring
>>>> configuration file.  In addition to being operator unfriendly (most
>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>> the core bootstrapping mechanism to operators.  Therefore, a
>>>> misconfiguration could negatively impact the injection/configuration
>>>> of internal management server components.  Essentially handing them
>>>> a loaded shotgun pointed at our right foot.
>>> 
>>> This has been my pet-peeve too and I was told you can write properties
>>> files
>>> above the spring contexts to make it simpler for operators to look at.
>>> 
>>> Overall a great proposal and look forward to see more concrete steps
>>> that follow on the implementation details.
>>> 
>>> -- 
>>> Prasanna.,
>>> 
>>> ------------------------
>>> Powered by BigRock.com
>>> 
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Kelven Yang <ke...@citrix.com>.

Spring is not meant to be used as a solution for run-time "plug-ins".
Darren is correct that Spring XML should be treated as code (ideal place
for it is the resource section inside the jar). Why we end up the way now
is mainly for practical reason. Since most of our current pluggable
features are not yet designed to be fully run-time loadable, most of them
have compile time linkage to other framework components that are solved at
loading time by Spring.

Only after we have cleaned up all these tightly coupled loading time
bindings, can we have a much simpler plugin configuration. And this
run-time loadable framework does not necessary to be based on any complex
ones (i.e., OSGi).

Kelven 

On 8/21/13 8:42 AM, "Darren Shepherd" <da...@gmail.com> wrote:

>I also agree with this.  Spring XML should always be treated as code not
>really configuration.  It's not good to have a sysadmin touch spring
>config and frankly it's just mean to force them to.
>
>I would ideally like to see that registering a module is as simple as
>putting a jar in a directory.  If its in the directory it gets loaded.
>Then additionally you should have a way such that you can explicitly tell
>it not to load modules based on some configuration.  That way, if for
>some reason moving the jar is not possible, you can still disallow it.
>
>So for example the directory based approach works well with rpm/deb's so
>"yum install mycoolplugin" will just place jar somewhere.  But say your
>troubleshooting or whatever, you don't really want to have to do "yum
>remove..." just to troubleshoot.  It would be nice to just edit some file
>and say "plugin.mycoolplugin.load=false" (or env variable or whatever)
>
>Darren
>
>On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:
>
>> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>>> Leaky Abstraction:  Plugins are registered through a Spring
>>> configuration file.  In addition to being operator unfriendly (most
>>> sysadmins are not Spring experts nor do they want to be), we expose
>>> the core bootstrapping mechanism to operators.  Therefore, a
>>> misconfiguration could negatively impact the injection/configuration
>>> of internal management server components.  Essentially handing them
>>> a loaded shotgun pointed at our right foot.
>> 
>> This has been my pet-peeve too and I was told you can write properties
>>files
>> above the spring contexts to make it simpler for operators to look at.
>> 
>> Overall a great proposal and look forward to see more concrete steps
>> that follow on the implementation details.
>> 
>> -- 
>> Prasanna.,
>> 
>> ------------------------
>> Powered by BigRock.com
>>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Darren Shepherd <da...@gmail.com>.

I also agree with this.  Spring XML should always be treated as code not really configuration.  It's not good to have a sysadmin touch spring config and frankly it's just mean to force them to.

I would ideally like to see that registering a module is as simple as putting a jar in a directory.  If its in the directory it gets loaded.  Then additionally you should have a way such that you can explicitly tell it not to load modules based on some configuration.  That way, if for some reason moving the jar is not possible, you can still disallow it.

So for example the directory based approach works well with rpm/deb's so "yum install mycoolplugin" will just place jar somewhere.  But say your troubleshooting or whatever, you don't really want to have to do "yum remove..." just to troubleshoot.  It would be nice to just edit some file and say "plugin.mycoolplugin.load=false" (or env variable or whatever)

Darren

On Aug 21, 2013, at 6:51 AM, Prasanna Santhanam <ts...@apache.org> wrote:

> On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
>> Leaky Abstraction:  Plugins are registered through a Spring
>> configuration file.  In addition to being operator unfriendly (most
>> sysadmins are not Spring experts nor do they want to be), we expose
>> the core bootstrapping mechanism to operators.  Therefore, a
>> misconfiguration could negatively impact the injection/configuration
>> of internal management server components.  Essentially handing them
>> a loaded shotgun pointed at our right foot.
> 
> This has been my pet-peeve too and I was told you can write properties files
> above the spring contexts to make it simpler for operators to look at.
> 
> Overall a great proposal and look forward to see more concrete steps
> that follow on the implementation details.
> 
> -- 
> Prasanna.,
> 
> ------------------------
> Powered by BigRock.com
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Prasanna Santhanam <ts...@apache.org>.

On Tue, Aug 20, 2013 at 05:43:17PM -0400, John Burwell wrote:
> Leaky Abstraction:  Plugins are registered through a Spring
> configuration file.  In addition to being operator unfriendly (most
> sysadmins are not Spring experts nor do they want to be), we expose
> the core bootstrapping mechanism to operators.  Therefore, a
> misconfiguration could negatively impact the injection/configuration
> of internal management server components.  Essentially handing them
> a loaded shotgun pointed at our right foot.

This has been my pet-peeve too and I was told you can write properties files
above the spring contexts to make it simpler for operators to look at.

Overall a great proposal and look forward to see more concrete steps
that follow on the implementation details.

-- 
Prasanna.,

------------------------
Powered by BigRock.com

Fwd: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Mike Tutkowski <mi...@solidfire.com>.

Hey Chris,

This e-mail chain might be of interest to you.

Talk to you later,
Mike

---------- Forwarded message ----------
From: John Burwell <jb...@basho.com>
Date: Tue, Aug 20, 2013 at 3:43 PM
Subject: [DISCUSS/PROPOSAL] Upgrading Driver Model
To: "dev@cloudstack.apache.org" <de...@cloudstack.apache.org>
Cc: Daan Hoogland <da...@gmail.com>, Hugo Trippaers <
htrippaers@schubergphilis.com>, "La Motta, David" <Da...@netapp.com>

All,

In capturing my thoughts on storage, my thinking backed into the driver
model.  While we have the beginnings of such a model today, I see the
following deficiencies:

   1. *Multiple Models*: The Storage, Hypervisor, and Security layers each
   have a slightly different model for allowing system functionality to be
   extended/substituted.  These differences increase the barrier of entry for
   vendors seeking to extend CloudStack and accrete code paths to be
   maintained and verified.
   2. *Leaky Abstraction*:  Plugins are registered through a Spring
   configuration file.  In addition to being operator unfriendly (most
   sysadmins are not Spring experts nor do they want to be), we expose the
   core bootstrapping mechanism to operators.  Therefore, a misconfiguration
   could negatively impact the injection/configuration of internal management
   server components.  Essentially handing them a loaded shotgun pointed at
   our right foot.
   3. *Nondeterministic Load/Unload Model*:  Because the core loading
   mechanism is Spring, the management has little control over the timing and
   order of component loading/unloading.  Changes to the Management Server's
   component dependency graph could break a driver by causing it to be started
   at an unexpected time.
   4. *Lack of Execution Isolation*: As a Spring component, plugins are
   loaded into the same execution context as core management server
   components.  Therefore, an errant plugin can corrupt the entire management
   server.

For next revision of the plugin/driver mechanism, I would like see us
migrate towards a standard pluggable driver model that supports all of the
management server's extension points (e.g. network devices, storage
devices, hypervisors, etc) with the following capabilities:

   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
   common state machine and categorization (e.g. network, storage, hypervisor,
   etc) that permits the deterministic calculation of initialization and
   destruction order (i.e. network layer drivers -> storage layer drivers ->
   hypervisor drivers).  Plugin inter-dependencies would be supported between
   plugins sharing the same category.
   - *In-process Installation and Upgrade*: Adding or upgrading a driver
   does not require the management server to be restarted.  This capability
   implies a system that supports the simultaneous execution of multiple
   driver versions and the ability to suspend continued execution work on a
   resource while the underlying driver instance is replaced.
   - *Execution Isolation*: The deployment packaging and execution
   environment supports different (and potentially conflicting) versions of
   dependencies to be simultaneously used.  Additionally, plugins would be
   sufficiently sandboxed to protect the management server against driver
   instability.
   - *Extension Data Model*: Drivers provide a property bag with a metadata
   descriptor to validate and render vendor specific data.  The contents of
   this property bag will provided to every driver operation invocation at
   runtime.  The metadata descriptor would be a lightweight description that
   provides a label resource key, a description resource key, data type
   (string, date, number, boolean), required flag, and optional length limit.
   - *Introspection: Administrative APIs/UIs allow operators to understand
   the configuration of the drivers in the system, their configuration, and
   their current state.*
   - *Discoverability*: Optionally, drivers can be discovered via a project
   repository definition (similar to Yum) allowing drivers to be remotely
   acquired and operators to be notified regarding update availability.  The
   project would also provide, free of charge, certificates to sign plugins.
    This mechanism would support local mirroring to support air gapped
   management networks.

Fundamentally, I do not want to turn CloudStack into an erector set with
more screws than nuts which is a risk with highly pluggable architectures.
 As such, I think we would need to tightly bound the scope of drivers and
their behaviors to prevent the loss system usability and stability.  My
thinking is that drivers would be packaged into a custom JAR, CAR
(CloudStack ARchive), that would be structured as followed:

   - META-INF
      - MANIFEST.MF
      - driver.yaml (driver metadata(e.g. version, name, description, etc)
      serialized in YAML format)
      - LICENSE (a text file containing the driver's license)
   - lib (driver dependencies)
   - classes (driver implementation)
   - resources (driver message files and potentially JS resources)

The management server would acquire drivers through a simple scan of a URL
(e.g. file directory, S3 bucket, etc).  For every CAR object found, the
management server would create an execution environment (likely a dedicated
ExecutorService and Classloader), and transition the state of the driver to
Running (the exact state model would need to be worked out).  To be really
nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
create CARs.   I can also imagine an opportunities to add hooks to this
model to register instrumentation information with JMX and authorization.

To keep the scope of this email confined, we would introduce the general
notion of a Resource, and (hand wave hand wave) eventually compartmentalize
the execution of work around a resource [1].  This (hand waved)
compartmentalization would allow us the controls necessary to safely and
reliably perform in-place driver upgrades.  For an initial release, I would
recommend implementing the abstractions, loading mechanism, extension data
model, and discovery features.  With these capabilities in place, we could
attack the in-place upgrade model.

If we were to adopt such a pluggable capability, we would have the
opportunity to decouple the vendor and CloudStack release schedules.  For
example, if a vendor were introducing a new product that required a new or
updated driver, they would no longer need to wait for a CloudStack release
to support it.  They would also gain the ability to fix high priority
defects in the same manner.

I have hand waved a number of issues that would need to be resolved before
such an approach could be implemented.  However, I think we need to decide,
as a community, that it worth devoting energy and effort to enhancing the
plugin/driver model and the goals of that effort before driving head first
into the deep rabbit hole of design/implementation.

Thoughts? (/me ducks)
-John

[1]: My opinions on the matter from CloudStack Collab 2013 ->
http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management

-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Daan Hoogland <da...@gmail.com>.

I could give inline answers, but let's not waste to much more time.
One point I would like to make is that the live-cycle functions that
driver writers implement take care of how (in what state) instances
are stopped.

Your point on restricting dependencies is valid and a real concern.

And to not end this discussion I would like to refer my previous post;
I would love to help on this not withstanding any objection I have on
the way to go. It seems like fun to implement:)

regards,
Daan


On Mon, Aug 26, 2013 at 5:13 AM, John Burwell <jb...@basho.com> wrote:
> Daan,
>
> Please see my responses in-line below.  The TL;DR is that I am extremely skeptical of the complexity and flexibility of OSGi.  My experience with it in practice has not been positive.  However, I want to focus on our requirements for a driver mechanism, and then determine the best implementation.
>
> Thanks,
> -John
>
> On Aug 21, 2013, at 4:14 AM, Daan Hoogland <da...@gmail.com> wrote:
>
>> John,
>>
>> You do want 'In-process Installation and Upgrade', 'Introspection' and
>> 'Discoverability' says that you do want flexibility. You disqualify
>> Spring and OSGi on this quality however.
>
> On the surface, it would appear the OSGi fits into In-process Installation and Upgrade.  However, OSGi assumes a consistency attribute that is too rigid for CloudStack.   As I understand the specification, when a bundle is upgraded, all instances in the container are upgraded simultaneously.  Based on my reading of it, there is no way to customize this behavior.  I think we need the upgrade process will be eventually consistent where by the underlying driver instance for a resource will be upgraded when it is both a consistent and upgradeable state. For example, we have 10,000 KVM hosts, and the KVM driver is upgraded. 9,000 of them are idle, and can take the upgrade immediately.  The other 1,000 are in some state of operation (creating and destroying VMs, taking snapshots, etc).  For these 1,000, we want to the upgrade to happen when they complete their current work.  Most importantly, we don't want any work bound for these 10,000 resources during the upgrade to be lost only delayed.
>
> When I say discoverability, I mean end-users finding drivers to install.  The more I think about it, the more I explicitly do not want drivers to depend on each other.  Drivers should be self-contained, stateless mechanisms that interact with some piece of infrastructure.  I think the path to madness lies in having a messy web of cross-vendor driver dependencies.
>
>>
>> If we can restrict the use of bundles to those that adhere to some
>> interfaces we prescribe I don't think either complexity nor dependency
>> are an issue.
>
> The only restriction I see is the ability of a bundle to control what is publicly exported.  However, I see no way to restrict how bundles depend on each other -- opening the door to cross vendor driver dependencies.
>
>>
>> Most every bit of complexity of writing a bundle can be hidden from
>> the bundle-developer nowadays. If we can not hide enough it is not an
>> option indeed. The main focus of OSGI is life cycle management which
>> is exactly what we need. the use that eclipse makes of it is a good
>> example not to follow but doesn't disqualify the entire thing.
>
> Personally, I am dubious that a build process can mask complexity.  More importantly, I don't like creating designs that require tooling and code generation with a veneer of simplicity but actually create a spooky action at a distance.  I prefer creating truly simple systems that can be easily comprehended.
>
>>
>> The dependency hell is not different from what we have as regular
>> monolithical development group. We control what we package and how. A
>> valid point is that some libraries might have issues that prevent them
>> from being bundled and that needs investigation. So we need to package
>> those libraries as bundles ourselves so 3rd parties don't need to. We
>> package them now anyway.
>
> In my experience, the dependency management problem is magnified by the added hurdle that every dependency be an OSGi bundle.  Many projects do not natively ship OSGi bundles, leaving third-parties or the project itself to repackage them.  Often OSGi bundled versions are behind the most current project releases.
>
>>
>> The erector set fear you have is just as valid with as without osgi or
>> any existing framework.
>
> Agreed.  I prefer inaction on this topic than create said erector set.
>
>>
>> I don't insist on OSGi and I do agree with your initial set of
>> requirements. When I read it I think, "let's use OSGi". And I don't
>> see anything but fear of the beast in your arguments against it. Maybe
>> your fear is just in my perception or maybe it is very valid. I'm not
>> perceptible to it after your reply, yet.
>
> To my mind, OSGi is a wonderful idea.  We need it, or something like it, standard in the JVM.  However, in practice, it is a difficult beast because it working around limitations in the JVM.  When it works, it is awesome until it breaks or you hit the dependency hell I described.  If we adopt it, we need to ensure it will fit our needs and the functional gain merits taking on the burden of its risks.
>
>>
>> regards,
>> Daan
>>
>> On Wed, Aug 21, 2013 at 9:00 AM, John Burwell <jb...@basho.com> wrote:
>>> Daan,
>>>
>>> I have the following issues with OSGi:
>>>
>>> Complexity:  Building OSGi components adds a tremendous amount of complexity
>>> to both the building drivers and debugging runtime issues.  Additionally,
>>> OSGi has a much broader feature set than I think CloudStack needs to
>>> support.  Therefore, driver authors may use the feature set in unanticipated
>>> way that create system instability.
>>> Dependency Hell: OSGi requires 3rd party dependencies to be packaged as OSGi
>>> bundles.  In practice, many third party libraries either have issues that
>>> prevent them from being bundles or their OSGi bundled versions are behind
>>> mainline release.
>>>
>>>
>>> As an additionally personal experience, I do not want to re-create the mess
>>> that is Eclipse (i.e. an erector set with more screws than nuts).  In
>>> addition to its lack of reliability, it is incredibly difficult to
>>> comprehend how the component configurations and relationships are composed
>>> at runtime.
>>>
>>> To be clear, I am not interested in creating a general purpose
>>> component/plugin model.  Fundamentally, we need a simple, purpose-built
>>> component model focused on providing stability and reliability through
>>> deterministic behavior rather than feature flexibility.  Unfortunately, both
>>> OSGi and Spring's focus on flexibility the later make them ill-suited for
>>> our purposes.
>>>
>>> Thanks,
>>> -John
>>>
>>> On Aug 21, 2013, at 2:31 AM, Daan Hoogland <da...@gmail.com> wrote:
>>>
>>> John,
>>>
>>> Nice work.
>>> Given the maturity of OSGi, I'd say lets see how it fits. One criteria
>>> would be can we limit the bundles that may be loaded based on what
>>> Cloudstack supports (and not allow loading pydev) if not we need to
>>> bake our own.
>>>
>>> But though I think your work is valuable I disagree on designing our
>>> CARs from the get go without having explored usable options in the
>>> field first. A new type of YARs is not what the world or cloudstack
>>> needs. And given what you have written the main problem wll be finding
>>> a framework we can restrict to what we want, not one that can do all
>>> of it.
>>>
>>> done shooting,
>>> Daan
>>>
>>> On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
>>> <da...@gmail.com> wrote:
>>>
>>> Sure, I fully understand how it theoretically works, but I'm saying from a
>>> practical perspective it always seems to fall apart.  What your describing
>>> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
>>> allows you to expose services that can be dynamically updated at runtime.
>>>
>>> The issues always happens with unloading.  I'll give you a real world
>>> example.  As part of the servlet spec your supposed to be able to stop and
>>> unload wars.  But in practice if you do it enough times you typically run
>>> out of memory.  So one such issue was with commons logging (since fixed).
>>> When you do getLogger(myclass.class) it would cache a reference of the Class
>>> object to the actual log impl.  The commons logging jar is typically loaded
>>> with a system classloader and but MyClass.class would be loaded in the
>>> webapp classloader.  So when you stop the war there is a reference chain
>>> system classloader -> logfactory -> Myclass -> webapp classloader.  So the
>>> web app never gets GC'd.
>>>
>>> So just pointing out the practical issues, that's it.
>>>
>>> Darren
>>>
>>> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>>>
>>> Darren,
>>>
>>> Actually, loading and unloading aren't difficult if resource management and
>>> drivers work within the following constraints/assumptions:
>>>
>>> Drivers are transient and stateless
>>> A driver instance is assigned per resource managed (i.e. no singletons)
>>> A lightweight thread and mailbox (i.e. actor model) are assigned per
>>> resource managed (outlined in the presentation referenced below)
>>>
>>>
>>> Based on these constraints and assumptions, the following upgrade process
>>> could be implemented:
>>>
>>> Load and verify new driver version to make it available
>>> Notify the supervisor processes of each affected resource that a new driver
>>> is available
>>> Upon completion of the current message being processed by its associated
>>> actor, the supervisor kills and respawns the actor managing its associated
>>> resource
>>> As part of startup, the supervisor injects an instance of the new driver
>>> version and the actor resumes processing messages in its mailbox
>>>
>>>
>>> This process mirrors the process that would occur on management server
>>> startup for each resource minus killing an existing actor instance.
>>> Eventually, the system will upgrade the driver without loss of operation.
>>> More sophisticated policies could be added, but I think this approach would
>>> be a solid default upgrade behavior.  As a bonus, this same approach could
>>> also be applied to global configuration settings -- allowing the system to
>>> apply changes to these values without restarting the system.
>>>
>>> In summary, CloudStack and Eclipse are very different types of systems.
>>> Eclipse is a desktop application implementing complex workflows, user
>>> interactions, and management of shared state (e.g. project structure, AST,
>>> compiler status, etc).  In contrast, CloudStack is an eventually consistent
>>> distributed system performing automation control.  As such, its requirements
>>> plugin requirements are not only very different, but IMHO, much simpler.
>>>
>>> Thanks,
>>> -John
>>>
>>> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com>
>>> wrote:
>>>
>>> I know this isn't terribly useful, but I've been drawing a lot of squares
>>> and circles and lines that connect those squares and circles lately and I
>>> have a lot of architectural ideas for CloudStack.  At the rate I'm going it
>>> will take me about two weeks to put together a discussion/proposal for the
>>> community.  What I'm thinking is a superset of what you've listed out and
>>> should align with your idea of a CAR.  The focus has a a lot to do with
>>> modularity and extensibility.
>>>
>>> So more to come soon....  I will say one thing though, is with java you end
>>> up having a hard time doing dynamic load and unloading of modules.  There's
>>> plenty of frameworks that try really hard to do this right, like OSGI, but
>>> its darn near impossible to do it right because of class loading and GC
>>> issues (and that's why Eclipse has you restart after installing plugs even
>>> though it is OSGi).
>>>
>>> I do believe that CloudStack should be possible of zero downtime maintenance
>>> and have ideas around that, but at the end of the day, for plenty of
>>> practical reasons, you still need a JVM restart if modules change.
>>>
>>> Darren
>>>
>>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com>
>>> wrote:
>>>
>>> I agree, John - let's get consensus first, then talk time tables.
>>>
>>>
>>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>>>
>>> Mike,
>>>
>>> Before we can dig into timelines or implementations, I think we need to
>>> get consensus on the problem to solved and the goals.  Once we have a
>>> proper understanding of the scope, I believe we can chunk the across a set
>>> of development lifecycle.  The subject is vast, but it also has a far
>>> reaching impact to both the storage and network layer evolution efforts.
>>> As such, I believe we need to start addressing it as part of the next
>>> release.
>>>
>>> As a separate thread, we need to discuss the timeline for the next
>>> release.  I think we need to avoid the time compression caused by the
>>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>>> don't think we should consider development of the next release started
>>> until the first 4.2 RC is released.  I will try to open a separate discuss
>>> thread for this topic, as well as, tying of the discussion of release code
>>> names.
>>>
>>> Thanks,
>>> -John
>>>
>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>>> wrote:
>>>
>>> Hey John,
>>>
>>> I think this is some great stuff. Thanks for the write up.
>>>
>>> It looks like you have ideas around what might go into a first release of
>>> this plug-in framework. Were you thinking we'd have enough time to
>>>
>>> squeeze
>>>
>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>> that release for this) because we would only have about five weeks.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>>>
>>> wrote:
>>>
>>>
>>> All,
>>>
>>> In capturing my thoughts on storage, my thinking backed into the driver
>>> model.  While we have the beginnings of such a model today, I see the
>>> following deficiencies:
>>>
>>>
>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>> each have a slightly different model for allowing system
>>>
>>> functionality to
>>>
>>> be extended/substituted.  These differences increase the barrier of
>>>
>>> entry
>>>
>>> for vendors seeking to extend CloudStack and accrete code paths to be
>>> maintained and verified.
>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>> configuration file.  In addition to being operator unfriendly (most
>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>
>>> the
>>>
>>> core bootstrapping mechanism to operators.  Therefore, a
>>>
>>> misconfiguration
>>>
>>> could negatively impact the injection/configuration of internal
>>>
>>> management
>>>
>>> server components.  Essentially handing them a loaded shotgun pointed
>>>
>>> at
>>>
>>> our right foot.
>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>> mechanism is Spring, the management has little control over the
>>>
>>> timing and
>>>
>>> order of component loading/unloading.  Changes to the Management
>>>
>>> Server's
>>>
>>> component dependency graph could break a driver by causing it to be
>>>
>>> started
>>>
>>> at an unexpected time.
>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>> loaded into the same execution context as core management server
>>> components.  Therefore, an errant plugin can corrupt the entire
>>>
>>> management
>>>
>>> server.
>>>
>>>
>>> For next revision of the plugin/driver mechanism, I would like see us
>>> migrate towards a standard pluggable driver model that supports all of
>>>
>>> the
>>>
>>> management server's extension points (e.g. network devices, storage
>>> devices, hypervisors, etc) with the following capabilities:
>>>
>>>
>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>> common state machine and categorization (e.g. network, storage,
>>>
>>> hypervisor,
>>>
>>> etc) that permits the deterministic calculation of initialization and
>>> destruction order (i.e. network layer drivers -> storage layer
>>>
>>> drivers ->
>>>
>>> hypervisor drivers).  Plugin inter-dependencies would be supported
>>>
>>> between
>>>
>>> plugins sharing the same category.
>>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>> does not require the management server to be restarted.  This
>>>
>>> capability
>>>
>>> implies a system that supports the simultaneous execution of multiple
>>> driver versions and the ability to suspend continued execution work
>>>
>>> on a
>>>
>>> resource while the underlying driver instance is replaced.
>>> - *Execution Isolation*: The deployment packaging and execution
>>> environment supports different (and potentially conflicting) versions
>>>
>>> of
>>>
>>> dependencies to be simultaneously used.  Additionally, plugins would
>>>
>>> be
>>>
>>> sufficiently sandboxed to protect the management server against driver
>>> instability.
>>> - *Extension Data Model*: Drivers provide a property bag with a
>>> metadata descriptor to validate and render vendor specific data.  The
>>> contents of this property bag will provided to every driver operation
>>> invocation at runtime.  The metadata descriptor would be a lightweight
>>> description that provides a label resource key, a description
>>>
>>> resource key,
>>>
>>> data type (string, date, number, boolean), required flag, and optional
>>> length limit.
>>> - *Introspection: Administrative APIs/UIs allow operators to
>>> understand the configuration of the drivers in the system, their
>>> configuration, and their current state.*
>>> - *Discoverability*: Optionally, drivers can be discovered via a
>>> project repository definition (similar to Yum) allowing drivers to be
>>> remotely acquired and operators to be notified regarding update
>>> availability.  The project would also provide, free of charge,
>>>
>>> certificates
>>>
>>> to sign plugins.  This mechanism would support local mirroring to
>>>
>>> support
>>>
>>> air gapped management networks.
>>>
>>>
>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>> more screws than nuts which is a risk with highly pluggable
>>>
>>> architectures.
>>>
>>> As such, I think we would need to tightly bound the scope of drivers and
>>> their behaviors to prevent the loss system usability and stability.  My
>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>> (CloudStack ARchive), that would be structured as followed:
>>>
>>>
>>> - META-INF
>>>  - MANIFEST.MF
>>>  - driver.yaml (driver metadata(e.g. version, name, description,
>>>  etc) serialized in YAML format)
>>>  - LICENSE (a text file containing the driver's license)
>>> - lib (driver dependencies)
>>> - classes (driver implementation)
>>> - resources (driver message files and potentially JS resources)
>>>
>>>
>>> The management server would acquire drivers through a simple scan of a
>>>
>>> URL
>>>
>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>> management server would create an execution environment (likely a
>>>
>>> dedicated
>>>
>>> ExecutorService and Classloader), and transition the state of the
>>>
>>> driver to
>>>
>>> Running (the exact state model would need to be worked out).  To be
>>>
>>> really
>>>
>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>> model to register instrumentation information with JMX and
>>>
>>> authorization.
>>>
>>>
>>> To keep the scope of this email confined, we would introduce the general
>>> notion of a Resource, and (hand wave hand wave) eventually
>>>
>>> compartmentalize
>>>
>>> the execution of work around a resource [1].  This (hand waved)
>>> compartmentalization would allow us the controls necessary to safely and
>>> reliably perform in-place driver upgrades.  For an initial release, I
>>>
>>> would
>>>
>>> recommend implementing the abstractions, loading mechanism, extension
>>>
>>> data
>>>
>>> model, and discovery features.  With these capabilities in place, we
>>>
>>> could
>>>
>>> attack the in-place upgrade model.
>>>
>>> If we were to adopt such a pluggable capability, we would have the
>>> opportunity to decouple the vendor and CloudStack release schedules.
>>>
>>> For
>>>
>>> example, if a vendor were introducing a new product that required a new
>>>
>>> or
>>>
>>> updated driver, they would no longer need to wait for a CloudStack
>>>
>>> release
>>>
>>> to support it.  They would also gain the ability to fix high priority
>>> defects in the same manner.
>>>
>>> I have hand waved a number of issues that would need to be resolved
>>>
>>> before
>>>
>>> such an approach could be implemented.  However, I think we need to
>>>
>>> decide,
>>>
>>> as a community, that it worth devoting energy and effort to enhancing
>>>
>>> the
>>>
>>> plugin/driver model and the goals of that effort before driving head
>>>
>>> first
>>>
>>> into the deep rabbit hole of design/implementation.
>>>
>>> Thoughts? (/me ducks)
>>> -John
>>>
>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>>>
>>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>>
>>>
>>>
>>>
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
>>>
>>>
>>>
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
>>>
>>>
>>>
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Daan,

Please see my responses in-line below.  The TL;DR is that I am extremely skeptical of the complexity and flexibility of OSGi.  My experience with it in practice has not been positive.  However, I want to focus on our requirements for a driver mechanism, and then determine the best implementation.

Thanks,
-John

On Aug 21, 2013, at 4:14 AM, Daan Hoogland <da...@gmail.com> wrote:

> John,
> 
> You do want 'In-process Installation and Upgrade', 'Introspection' and
> 'Discoverability' says that you do want flexibility. You disqualify
> Spring and OSGi on this quality however.

On the surface, it would appear the OSGi fits into In-process Installation and Upgrade.  However, OSGi assumes a consistency attribute that is too rigid for CloudStack.   As I understand the specification, when a bundle is upgraded, all instances in the container are upgraded simultaneously.  Based on my reading of it, there is no way to customize this behavior.  I think we need the upgrade process will be eventually consistent where by the underlying driver instance for a resource will be upgraded when it is both a consistent and upgradeable state. For example, we have 10,000 KVM hosts, and the KVM driver is upgraded. 9,000 of them are idle, and can take the upgrade immediately.  The other 1,000 are in some state of operation (creating and destroying VMs, taking snapshots, etc).  For these 1,000, we want to the upgrade to happen when they complete their current work.  Most importantly, we don't want any work bound for these 10,000 resources during the upgrade to be lost only delayed.

When I say discoverability, I mean end-users finding drivers to install.  The more I think about it, the more I explicitly do not want drivers to depend on each other.  Drivers should be self-contained, stateless mechanisms that interact with some piece of infrastructure.  I think the path to madness lies in having a messy web of cross-vendor driver dependencies.  

> 
> If we can restrict the use of bundles to those that adhere to some
> interfaces we prescribe I don't think either complexity nor dependency
> are an issue.

The only restriction I see is the ability of a bundle to control what is publicly exported.  However, I see no way to restrict how bundles depend on each other -- opening the door to cross vendor driver dependencies.

> 
> Most every bit of complexity of writing a bundle can be hidden from
> the bundle-developer nowadays. If we can not hide enough it is not an
> option indeed. The main focus of OSGI is life cycle management which
> is exactly what we need. the use that eclipse makes of it is a good
> example not to follow but doesn't disqualify the entire thing.

Personally, I am dubious that a build process can mask complexity.  More importantly, I don't like creating designs that require tooling and code generation with a veneer of simplicity but actually create a spooky action at a distance.  I prefer creating truly simple systems that can be easily comprehended.

> 
> The dependency hell is not different from what we have as regular
> monolithical development group. We control what we package and how. A
> valid point is that some libraries might have issues that prevent them
> from being bundled and that needs investigation. So we need to package
> those libraries as bundles ourselves so 3rd parties don't need to. We
> package them now anyway.

In my experience, the dependency management problem is magnified by the added hurdle that every dependency be an OSGi bundle.  Many projects do not natively ship OSGi bundles, leaving third-parties or the project itself to repackage them.  Often OSGi bundled versions are behind the most current project releases.  

> 
> The erector set fear you have is just as valid with as without osgi or
> any existing framework.

Agreed.  I prefer inaction on this topic than create said erector set.

> 
> I don't insist on OSGi and I do agree with your initial set of
> requirements. When I read it I think, "let's use OSGi". And I don't
> see anything but fear of the beast in your arguments against it. Maybe
> your fear is just in my perception or maybe it is very valid. I'm not
> perceptible to it after your reply, yet.

To my mind, OSGi is a wonderful idea.  We need it, or something like it, standard in the JVM.  However, in practice, it is a difficult beast because it working around limitations in the JVM.  When it works, it is awesome until it breaks or you hit the dependency hell I described.  If we adopt it, we need to ensure it will fit our needs and the functional gain merits taking on the burden of its risks.

> 
> regards,
> Daan
> 
> On Wed, Aug 21, 2013 at 9:00 AM, John Burwell <jb...@basho.com> wrote:
>> Daan,
>> 
>> I have the following issues with OSGi:
>> 
>> Complexity:  Building OSGi components adds a tremendous amount of complexity
>> to both the building drivers and debugging runtime issues.  Additionally,
>> OSGi has a much broader feature set than I think CloudStack needs to
>> support.  Therefore, driver authors may use the feature set in unanticipated
>> way that create system instability.
>> Dependency Hell: OSGi requires 3rd party dependencies to be packaged as OSGi
>> bundles.  In practice, many third party libraries either have issues that
>> prevent them from being bundles or their OSGi bundled versions are behind
>> mainline release.
>> 
>> 
>> As an additionally personal experience, I do not want to re-create the mess
>> that is Eclipse (i.e. an erector set with more screws than nuts).  In
>> addition to its lack of reliability, it is incredibly difficult to
>> comprehend how the component configurations and relationships are composed
>> at runtime.
>> 
>> To be clear, I am not interested in creating a general purpose
>> component/plugin model.  Fundamentally, we need a simple, purpose-built
>> component model focused on providing stability and reliability through
>> deterministic behavior rather than feature flexibility.  Unfortunately, both
>> OSGi and Spring's focus on flexibility the later make them ill-suited for
>> our purposes.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 21, 2013, at 2:31 AM, Daan Hoogland <da...@gmail.com> wrote:
>> 
>> John,
>> 
>> Nice work.
>> Given the maturity of OSGi, I'd say lets see how it fits. One criteria
>> would be can we limit the bundles that may be loaded based on what
>> Cloudstack supports (and not allow loading pydev) if not we need to
>> bake our own.
>> 
>> But though I think your work is valuable I disagree on designing our
>> CARs from the get go without having explored usable options in the
>> field first. A new type of YARs is not what the world or cloudstack
>> needs. And given what you have written the main problem wll be finding
>> a framework we can restrict to what we want, not one that can do all
>> of it.
>> 
>> done shooting,
>> Daan
>> 
>> On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
>> <da...@gmail.com> wrote:
>> 
>> Sure, I fully understand how it theoretically works, but I'm saying from a
>> practical perspective it always seems to fall apart.  What your describing
>> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
>> allows you to expose services that can be dynamically updated at runtime.
>> 
>> The issues always happens with unloading.  I'll give you a real world
>> example.  As part of the servlet spec your supposed to be able to stop and
>> unload wars.  But in practice if you do it enough times you typically run
>> out of memory.  So one such issue was with commons logging (since fixed).
>> When you do getLogger(myclass.class) it would cache a reference of the Class
>> object to the actual log impl.  The commons logging jar is typically loaded
>> with a system classloader and but MyClass.class would be loaded in the
>> webapp classloader.  So when you stop the war there is a reference chain
>> system classloader -> logfactory -> Myclass -> webapp classloader.  So the
>> web app never gets GC'd.
>> 
>> So just pointing out the practical issues, that's it.
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>> 
>> Darren,
>> 
>> Actually, loading and unloading aren't difficult if resource management and
>> drivers work within the following constraints/assumptions:
>> 
>> Drivers are transient and stateless
>> A driver instance is assigned per resource managed (i.e. no singletons)
>> A lightweight thread and mailbox (i.e. actor model) are assigned per
>> resource managed (outlined in the presentation referenced below)
>> 
>> 
>> Based on these constraints and assumptions, the following upgrade process
>> could be implemented:
>> 
>> Load and verify new driver version to make it available
>> Notify the supervisor processes of each affected resource that a new driver
>> is available
>> Upon completion of the current message being processed by its associated
>> actor, the supervisor kills and respawns the actor managing its associated
>> resource
>> As part of startup, the supervisor injects an instance of the new driver
>> version and the actor resumes processing messages in its mailbox
>> 
>> 
>> This process mirrors the process that would occur on management server
>> startup for each resource minus killing an existing actor instance.
>> Eventually, the system will upgrade the driver without loss of operation.
>> More sophisticated policies could be added, but I think this approach would
>> be a solid default upgrade behavior.  As a bonus, this same approach could
>> also be applied to global configuration settings -- allowing the system to
>> apply changes to these values without restarting the system.
>> 
>> In summary, CloudStack and Eclipse are very different types of systems.
>> Eclipse is a desktop application implementing complex workflows, user
>> interactions, and management of shared state (e.g. project structure, AST,
>> compiler status, etc).  In contrast, CloudStack is an eventually consistent
>> distributed system performing automation control.  As such, its requirements
>> plugin requirements are not only very different, but IMHO, much simpler.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com>
>> wrote:
>> 
>> I know this isn't terribly useful, but I've been drawing a lot of squares
>> and circles and lines that connect those squares and circles lately and I
>> have a lot of architectural ideas for CloudStack.  At the rate I'm going it
>> will take me about two weeks to put together a discussion/proposal for the
>> community.  What I'm thinking is a superset of what you've listed out and
>> should align with your idea of a CAR.  The focus has a a lot to do with
>> modularity and extensibility.
>> 
>> So more to come soon....  I will say one thing though, is with java you end
>> up having a hard time doing dynamic load and unloading of modules.  There's
>> plenty of frameworks that try really hard to do this right, like OSGI, but
>> its darn near impossible to do it right because of class loading and GC
>> issues (and that's why Eclipse has you restart after installing plugs even
>> though it is OSGi).
>> 
>> I do believe that CloudStack should be possible of zero downtime maintenance
>> and have ideas around that, but at the end of the day, for plenty of
>> practical reasons, you still need a JVM restart if modules change.
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>> I agree, John - let's get consensus first, then talk time tables.
>> 
>> 
>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>> 
>> Mike,
>> 
>> Before we can dig into timelines or implementations, I think we need to
>> get consensus on the problem to solved and the goals.  Once we have a
>> proper understanding of the scope, I believe we can chunk the across a set
>> of development lifecycle.  The subject is vast, but it also has a far
>> reaching impact to both the storage and network layer evolution efforts.
>> As such, I believe we need to start addressing it as part of the next
>> release.
>> 
>> As a separate thread, we need to discuss the timeline for the next
>> release.  I think we need to avoid the time compression caused by the
>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>> don't think we should consider development of the next release started
>> until the first 4.2 RC is released.  I will try to open a separate discuss
>> thread for this topic, as well as, tying of the discussion of release code
>> names.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>> Hey John,
>> 
>> I think this is some great stuff. Thanks for the write up.
>> 
>> It looks like you have ideas around what might go into a first release of
>> this plug-in framework. Were you thinking we'd have enough time to
>> 
>> squeeze
>> 
>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>> that release for this) because we would only have about five weeks.
>> 
>> Thanks
>> 
>> 
>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>> 
>> wrote:
>> 
>> 
>> All,
>> 
>> In capturing my thoughts on storage, my thinking backed into the driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>> 
>> 
>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>> each have a slightly different model for allowing system
>> 
>> functionality to
>> 
>> be extended/substituted.  These differences increase the barrier of
>> 
>> entry
>> 
>> for vendors seeking to extend CloudStack and accrete code paths to be
>> maintained and verified.
>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>> configuration file.  In addition to being operator unfriendly (most
>> sysadmins are not Spring experts nor do they want to be), we expose
>> 
>> the
>> 
>> core bootstrapping mechanism to operators.  Therefore, a
>> 
>> misconfiguration
>> 
>> could negatively impact the injection/configuration of internal
>> 
>> management
>> 
>> server components.  Essentially handing them a loaded shotgun pointed
>> 
>> at
>> 
>> our right foot.
>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>> mechanism is Spring, the management has little control over the
>> 
>> timing and
>> 
>> order of component loading/unloading.  Changes to the Management
>> 
>> Server's
>> 
>> component dependency graph could break a driver by causing it to be
>> 
>> started
>> 
>> at an unexpected time.
>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>> loaded into the same execution context as core management server
>> components.  Therefore, an errant plugin can corrupt the entire
>> 
>> management
>> 
>> server.
>> 
>> 
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all of
>> 
>> the
>> 
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>> 
>> 
>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>> common state machine and categorization (e.g. network, storage,
>> 
>> hypervisor,
>> 
>> etc) that permits the deterministic calculation of initialization and
>> destruction order (i.e. network layer drivers -> storage layer
>> 
>> drivers ->
>> 
>> hypervisor drivers).  Plugin inter-dependencies would be supported
>> 
>> between
>> 
>> plugins sharing the same category.
>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>> does not require the management server to be restarted.  This
>> 
>> capability
>> 
>> implies a system that supports the simultaneous execution of multiple
>> driver versions and the ability to suspend continued execution work
>> 
>> on a
>> 
>> resource while the underlying driver instance is replaced.
>> - *Execution Isolation*: The deployment packaging and execution
>> environment supports different (and potentially conflicting) versions
>> 
>> of
>> 
>> dependencies to be simultaneously used.  Additionally, plugins would
>> 
>> be
>> 
>> sufficiently sandboxed to protect the management server against driver
>> instability.
>> - *Extension Data Model*: Drivers provide a property bag with a
>> metadata descriptor to validate and render vendor specific data.  The
>> contents of this property bag will provided to every driver operation
>> invocation at runtime.  The metadata descriptor would be a lightweight
>> description that provides a label resource key, a description
>> 
>> resource key,
>> 
>> data type (string, date, number, boolean), required flag, and optional
>> length limit.
>> - *Introspection: Administrative APIs/UIs allow operators to
>> understand the configuration of the drivers in the system, their
>> configuration, and their current state.*
>> - *Discoverability*: Optionally, drivers can be discovered via a
>> project repository definition (similar to Yum) allowing drivers to be
>> remotely acquired and operators to be notified regarding update
>> availability.  The project would also provide, free of charge,
>> 
>> certificates
>> 
>> to sign plugins.  This mechanism would support local mirroring to
>> 
>> support
>> 
>> air gapped management networks.
>> 
>> 
>> Fundamentally, I do not want to turn CloudStack into an erector set with
>> more screws than nuts which is a risk with highly pluggable
>> 
>> architectures.
>> 
>> As such, I think we would need to tightly bound the scope of drivers and
>> their behaviors to prevent the loss system usability and stability.  My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>> 
>> 
>> - META-INF
>>  - MANIFEST.MF
>>  - driver.yaml (driver metadata(e.g. version, name, description,
>>  etc) serialized in YAML format)
>>  - LICENSE (a text file containing the driver's license)
>> - lib (driver dependencies)
>> - classes (driver implementation)
>> - resources (driver message files and potentially JS resources)
>> 
>> 
>> The management server would acquire drivers through a simple scan of a
>> 
>> URL
>> 
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>> management server would create an execution environment (likely a
>> 
>> dedicated
>> 
>> ExecutorService and Classloader), and transition the state of the
>> 
>> driver to
>> 
>> Running (the exact state model would need to be worked out).  To be
>> 
>> really
>> 
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>> create CARs.   I can also imagine an opportunities to add hooks to this
>> model to register instrumentation information with JMX and
>> 
>> authorization.
>> 
>> 
>> To keep the scope of this email confined, we would introduce the general
>> notion of a Resource, and (hand wave hand wave) eventually
>> 
>> compartmentalize
>> 
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely and
>> reliably perform in-place driver upgrades.  For an initial release, I
>> 
>> would
>> 
>> recommend implementing the abstractions, loading mechanism, extension
>> 
>> data
>> 
>> model, and discovery features.  With these capabilities in place, we
>> 
>> could
>> 
>> attack the in-place upgrade model.
>> 
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.
>> 
>> For
>> 
>> example, if a vendor were introducing a new product that required a new
>> 
>> or
>> 
>> updated driver, they would no longer need to wait for a CloudStack
>> 
>> release
>> 
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>> 
>> I have hand waved a number of issues that would need to be resolved
>> 
>> before
>> 
>> such an approach could be implemented.  However, I think we need to
>> 
>> decide,
>> 
>> as a community, that it worth devoting energy and effort to enhancing
>> 
>> the
>> 
>> plugin/driver model and the goals of that effort before driving head
>> 
>> first
>> 
>> into the deep rabbit hole of design/implementation.
>> 
>> Thoughts? (/me ducks)
>> -John
>> 
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> 
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>> 
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*
>> 
>> 
>>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Daan Hoogland <da...@gmail.com>.

John,

You do want 'In-process Installation and Upgrade', 'Introspection' and
'Discoverability' says that you do want flexibility. You disqualify
Spring and OSGi on this quality however.

If we can restrict the use of bundles to those that adhere to some
interfaces we prescribe I don't think either complexity nor dependency
are an issue.

Most every bit of complexity of writing a bundle can be hidden from
the bundle-developer nowadays. If we can not hide enough it is not an
option indeed. The main focus of OSGI is life cycle management which
is exactly what we need. the use that eclipse makes of it is a good
example not to follow but doesn't disqualify the entire thing.

The dependency hell is not different from what we have as regular
monolithical development group. We control what we package and how. A
valid point is that some libraries might have issues that prevent them
from being bundled and that needs investigation. So we need to package
those libraries as bundles ourselves so 3rd parties don't need to. We
package them now anyway.

The erector set fear you have is just as valid with as without osgi or
any existing framework.

I don't insist on OSGi and I do agree with your initial set of
requirements. When I read it I think, "let's use OSGi". And I don't
see anything but fear of the beast in your arguments against it. Maybe
your fear is just in my perception or maybe it is very valid. I'm not
perceptible to it after your reply, yet.

regards,
Daan

On Wed, Aug 21, 2013 at 9:00 AM, John Burwell <jb...@basho.com> wrote:
> Daan,
>
> I have the following issues with OSGi:
>
> Complexity:  Building OSGi components adds a tremendous amount of complexity
> to both the building drivers and debugging runtime issues.  Additionally,
> OSGi has a much broader feature set than I think CloudStack needs to
> support.  Therefore, driver authors may use the feature set in unanticipated
> way that create system instability.
> Dependency Hell: OSGi requires 3rd party dependencies to be packaged as OSGi
> bundles.  In practice, many third party libraries either have issues that
> prevent them from being bundles or their OSGi bundled versions are behind
> mainline release.
>
>
> As an additionally personal experience, I do not want to re-create the mess
> that is Eclipse (i.e. an erector set with more screws than nuts).  In
> addition to its lack of reliability, it is incredibly difficult to
> comprehend how the component configurations and relationships are composed
> at runtime.
>
> To be clear, I am not interested in creating a general purpose
> component/plugin model.  Fundamentally, we need a simple, purpose-built
> component model focused on providing stability and reliability through
> deterministic behavior rather than feature flexibility.  Unfortunately, both
> OSGi and Spring's focus on flexibility the later make them ill-suited for
> our purposes.
>
> Thanks,
> -John
>
> On Aug 21, 2013, at 2:31 AM, Daan Hoogland <da...@gmail.com> wrote:
>
> John,
>
> Nice work.
> Given the maturity of OSGi, I'd say lets see how it fits. One criteria
> would be can we limit the bundles that may be loaded based on what
> Cloudstack supports (and not allow loading pydev) if not we need to
> bake our own.
>
> But though I think your work is valuable I disagree on designing our
> CARs from the get go without having explored usable options in the
> field first. A new type of YARs is not what the world or cloudstack
> needs. And given what you have written the main problem wll be finding
> a framework we can restrict to what we want, not one that can do all
> of it.
>
> done shooting,
> Daan
>
> On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
> <da...@gmail.com> wrote:
>
> Sure, I fully understand how it theoretically works, but I'm saying from a
> practical perspective it always seems to fall apart.  What your describing
> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
> allows you to expose services that can be dynamically updated at runtime.
>
> The issues always happens with unloading.  I'll give you a real world
> example.  As part of the servlet spec your supposed to be able to stop and
> unload wars.  But in practice if you do it enough times you typically run
> out of memory.  So one such issue was with commons logging (since fixed).
> When you do getLogger(myclass.class) it would cache a reference of the Class
> object to the actual log impl.  The commons logging jar is typically loaded
> with a system classloader and but MyClass.class would be loaded in the
> webapp classloader.  So when you stop the war there is a reference chain
> system classloader -> logfactory -> Myclass -> webapp classloader.  So the
> web app never gets GC'd.
>
> So just pointing out the practical issues, that's it.
>
> Darren
>
> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>
> Darren,
>
> Actually, loading and unloading aren't difficult if resource management and
> drivers work within the following constraints/assumptions:
>
> Drivers are transient and stateless
> A driver instance is assigned per resource managed (i.e. no singletons)
> A lightweight thread and mailbox (i.e. actor model) are assigned per
> resource managed (outlined in the presentation referenced below)
>
>
> Based on these constraints and assumptions, the following upgrade process
> could be implemented:
>
> Load and verify new driver version to make it available
> Notify the supervisor processes of each affected resource that a new driver
> is available
> Upon completion of the current message being processed by its associated
> actor, the supervisor kills and respawns the actor managing its associated
> resource
> As part of startup, the supervisor injects an instance of the new driver
> version and the actor resumes processing messages in its mailbox
>
>
> This process mirrors the process that would occur on management server
> startup for each resource minus killing an existing actor instance.
> Eventually, the system will upgrade the driver without loss of operation.
> More sophisticated policies could be added, but I think this approach would
> be a solid default upgrade behavior.  As a bonus, this same approach could
> also be applied to global configuration settings -- allowing the system to
> apply changes to these values without restarting the system.
>
> In summary, CloudStack and Eclipse are very different types of systems.
> Eclipse is a desktop application implementing complex workflows, user
> interactions, and management of shared state (e.g. project structure, AST,
> compiler status, etc).  In contrast, CloudStack is an eventually consistent
> distributed system performing automation control.  As such, its requirements
> plugin requirements are not only very different, but IMHO, much simpler.
>
> Thanks,
> -John
>
> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com>
> wrote:
>
> I know this isn't terribly useful, but I've been drawing a lot of squares
> and circles and lines that connect those squares and circles lately and I
> have a lot of architectural ideas for CloudStack.  At the rate I'm going it
> will take me about two weeks to put together a discussion/proposal for the
> community.  What I'm thinking is a superset of what you've listed out and
> should align with your idea of a CAR.  The focus has a a lot to do with
> modularity and extensibility.
>
> So more to come soon....  I will say one thing though, is with java you end
> up having a hard time doing dynamic load and unloading of modules.  There's
> plenty of frameworks that try really hard to do this right, like OSGI, but
> its darn near impossible to do it right because of class loading and GC
> issues (and that's why Eclipse has you restart after installing plugs even
> though it is OSGi).
>
> I do believe that CloudStack should be possible of zero downtime maintenance
> and have ideas around that, but at the end of the day, for plenty of
> practical reasons, you still need a JVM restart if modules change.
>
> Darren
>
> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> I agree, John - let's get consensus first, then talk time tables.
>
>
> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>
> Mike,
>
> Before we can dig into timelines or implementations, I think we need to
> get consensus on the problem to solved and the goals.  Once we have a
> proper understanding of the scope, I believe we can chunk the across a set
> of development lifecycle.  The subject is vast, but it also has a far
> reaching impact to both the storage and network layer evolution efforts.
> As such, I believe we need to start addressing it as part of the next
> release.
>
> As a separate thread, we need to discuss the timeline for the next
> release.  I think we need to avoid the time compression caused by the
> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
> don't think we should consider development of the next release started
> until the first 4.2 RC is released.  I will try to open a separate discuss
> thread for this topic, as well as, tying of the discussion of release code
> names.
>
> Thanks,
> -John
>
> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> Hey John,
>
> I think this is some great stuff. Thanks for the write up.
>
> It looks like you have ideas around what might go into a first release of
> this plug-in framework. Were you thinking we'd have enough time to
>
> squeeze
>
> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> that release for this) because we would only have about five weeks.
>
> Thanks
>
>
> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>
> wrote:
>
>
> All,
>
> In capturing my thoughts on storage, my thinking backed into the driver
> model.  While we have the beginnings of such a model today, I see the
> following deficiencies:
>
>
> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
> each have a slightly different model for allowing system
>
> functionality to
>
> be extended/substituted.  These differences increase the barrier of
>
> entry
>
> for vendors seeking to extend CloudStack and accrete code paths to be
> maintained and verified.
> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
> configuration file.  In addition to being operator unfriendly (most
> sysadmins are not Spring experts nor do they want to be), we expose
>
> the
>
> core bootstrapping mechanism to operators.  Therefore, a
>
> misconfiguration
>
> could negatively impact the injection/configuration of internal
>
> management
>
> server components.  Essentially handing them a loaded shotgun pointed
>
> at
>
> our right foot.
> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
> mechanism is Spring, the management has little control over the
>
> timing and
>
> order of component loading/unloading.  Changes to the Management
>
> Server's
>
> component dependency graph could break a driver by causing it to be
>
> started
>
> at an unexpected time.
> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
> loaded into the same execution context as core management server
> components.  Therefore, an errant plugin can corrupt the entire
>
> management
>
> server.
>
>
> For next revision of the plugin/driver mechanism, I would like see us
> migrate towards a standard pluggable driver model that supports all of
>
> the
>
> management server's extension points (e.g. network devices, storage
> devices, hypervisors, etc) with the following capabilities:
>
>
> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
> common state machine and categorization (e.g. network, storage,
>
> hypervisor,
>
> etc) that permits the deterministic calculation of initialization and
> destruction order (i.e. network layer drivers -> storage layer
>
> drivers ->
>
> hypervisor drivers).  Plugin inter-dependencies would be supported
>
> between
>
> plugins sharing the same category.
> - *In-process Installation and Upgrade*: Adding or upgrading a driver
> does not require the management server to be restarted.  This
>
> capability
>
> implies a system that supports the simultaneous execution of multiple
> driver versions and the ability to suspend continued execution work
>
> on a
>
> resource while the underlying driver instance is replaced.
> - *Execution Isolation*: The deployment packaging and execution
> environment supports different (and potentially conflicting) versions
>
> of
>
> dependencies to be simultaneously used.  Additionally, plugins would
>
> be
>
> sufficiently sandboxed to protect the management server against driver
> instability.
> - *Extension Data Model*: Drivers provide a property bag with a
> metadata descriptor to validate and render vendor specific data.  The
> contents of this property bag will provided to every driver operation
> invocation at runtime.  The metadata descriptor would be a lightweight
> description that provides a label resource key, a description
>
> resource key,
>
> data type (string, date, number, boolean), required flag, and optional
> length limit.
> - *Introspection: Administrative APIs/UIs allow operators to
> understand the configuration of the drivers in the system, their
> configuration, and their current state.*
> - *Discoverability*: Optionally, drivers can be discovered via a
> project repository definition (similar to Yum) allowing drivers to be
> remotely acquired and operators to be notified regarding update
> availability.  The project would also provide, free of charge,
>
> certificates
>
> to sign plugins.  This mechanism would support local mirroring to
>
> support
>
> air gapped management networks.
>
>
> Fundamentally, I do not want to turn CloudStack into an erector set with
> more screws than nuts which is a risk with highly pluggable
>
> architectures.
>
> As such, I think we would need to tightly bound the scope of drivers and
> their behaviors to prevent the loss system usability and stability.  My
> thinking is that drivers would be packaged into a custom JAR, CAR
> (CloudStack ARchive), that would be structured as followed:
>
>
> - META-INF
>   - MANIFEST.MF
>   - driver.yaml (driver metadata(e.g. version, name, description,
>   etc) serialized in YAML format)
>   - LICENSE (a text file containing the driver's license)
> - lib (driver dependencies)
> - classes (driver implementation)
> - resources (driver message files and potentially JS resources)
>
>
> The management server would acquire drivers through a simple scan of a
>
> URL
>
> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
> management server would create an execution environment (likely a
>
> dedicated
>
> ExecutorService and Classloader), and transition the state of the
>
> driver to
>
> Running (the exact state model would need to be worked out).  To be
>
> really
>
> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
> create CARs.   I can also imagine an opportunities to add hooks to this
> model to register instrumentation information with JMX and
>
> authorization.
>
>
> To keep the scope of this email confined, we would introduce the general
> notion of a Resource, and (hand wave hand wave) eventually
>
> compartmentalize
>
> the execution of work around a resource [1].  This (hand waved)
> compartmentalization would allow us the controls necessary to safely and
> reliably perform in-place driver upgrades.  For an initial release, I
>
> would
>
> recommend implementing the abstractions, loading mechanism, extension
>
> data
>
> model, and discovery features.  With these capabilities in place, we
>
> could
>
> attack the in-place upgrade model.
>
> If we were to adopt such a pluggable capability, we would have the
> opportunity to decouple the vendor and CloudStack release schedules.
>
> For
>
> example, if a vendor were introducing a new product that required a new
>
> or
>
> updated driver, they would no longer need to wait for a CloudStack
>
> release
>
> to support it.  They would also gain the ability to fix high priority
> defects in the same manner.
>
> I have hand waved a number of issues that would need to be resolved
>
> before
>
> such an approach could be implemented.  However, I think we need to
>
> decide,
>
> as a community, that it worth devoting energy and effort to enhancing
>
> the
>
> plugin/driver model and the goals of that effort before driving head
>
> first
>
> into the deep rabbit hole of design/implementation.
>
> Thoughts? (/me ducks)
> -John
>
> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*
>
>
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Daan,

I have the following issues with OSGi: 

Complexity:  Building OSGi components adds a tremendous amount of complexity to both the building drivers and debugging runtime issues.  Additionally, OSGi has a much broader feature set than I think CloudStack needs to support.  Therefore, driver authors may use the feature set in unanticipated way that create system instability.
Dependency Hell: OSGi requires 3rd party dependencies to be packaged as OSGi bundles.  In practice, many third party libraries either have issues that prevent them from being bundles or their OSGi bundled versions are behind mainline release.

As an additionally personal experience, I do not want to re-create the mess that is Eclipse (i.e. an erector set with more screws than nuts).  In addition to its lack of reliability, it is incredibly difficult to comprehend how the component configurations and relationships are composed at runtime.

To be clear, I am not interested in creating a general purpose component/plugin model.  Fundamentally, we need a simple, purpose-built component model focused on providing stability and reliability through deterministic behavior rather than feature flexibility.  Unfortunately, both OSGi and Spring's focus on flexibility the later make them ill-suited for our purposes.

Thanks,
-John

On Aug 21, 2013, at 2:31 AM, Daan Hoogland <da...@gmail.com> wrote:

> John,
> 
> Nice work.
> Given the maturity of OSGi, I'd say lets see how it fits. One criteria
> would be can we limit the bundles that may be loaded based on what
> Cloudstack supports (and not allow loading pydev) if not we need to
> bake our own.
> 
> But though I think your work is valuable I disagree on designing our
> CARs from the get go without having explored usable options in the
> field first. A new type of YARs is not what the world or cloudstack
> needs. And given what you have written the main problem wll be finding
> a framework we can restrict to what we want, not one that can do all
> of it.
> 
> done shooting,
> Daan
> 
> On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
> <da...@gmail.com> wrote:
>> Sure, I fully understand how it theoretically works, but I'm saying from a
>> practical perspective it always seems to fall apart.  What your describing
>> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
>> allows you to expose services that can be dynamically updated at runtime.
>> 
>> The issues always happens with unloading.  I'll give you a real world
>> example.  As part of the servlet spec your supposed to be able to stop and
>> unload wars.  But in practice if you do it enough times you typically run
>> out of memory.  So one such issue was with commons logging (since fixed).
>> When you do getLogger(myclass.class) it would cache a reference of the Class
>> object to the actual log impl.  The commons logging jar is typically loaded
>> with a system classloader and but MyClass.class would be loaded in the
>> webapp classloader.  So when you stop the war there is a reference chain
>> system classloader -> logfactory -> Myclass -> webapp classloader.  So the
>> web app never gets GC'd.
>> 
>> So just pointing out the practical issues, that's it.
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>> 
>> Darren,
>> 
>> Actually, loading and unloading aren't difficult if resource management and
>> drivers work within the following constraints/assumptions:
>> 
>> Drivers are transient and stateless
>> A driver instance is assigned per resource managed (i.e. no singletons)
>> A lightweight thread and mailbox (i.e. actor model) are assigned per
>> resource managed (outlined in the presentation referenced below)
>> 
>> 
>> Based on these constraints and assumptions, the following upgrade process
>> could be implemented:
>> 
>> Load and verify new driver version to make it available
>> Notify the supervisor processes of each affected resource that a new driver
>> is available
>> Upon completion of the current message being processed by its associated
>> actor, the supervisor kills and respawns the actor managing its associated
>> resource
>> As part of startup, the supervisor injects an instance of the new driver
>> version and the actor resumes processing messages in its mailbox
>> 
>> 
>> This process mirrors the process that would occur on management server
>> startup for each resource minus killing an existing actor instance.
>> Eventually, the system will upgrade the driver without loss of operation.
>> More sophisticated policies could be added, but I think this approach would
>> be a solid default upgrade behavior.  As a bonus, this same approach could
>> also be applied to global configuration settings -- allowing the system to
>> apply changes to these values without restarting the system.
>> 
>> In summary, CloudStack and Eclipse are very different types of systems.
>> Eclipse is a desktop application implementing complex workflows, user
>> interactions, and management of shared state (e.g. project structure, AST,
>> compiler status, etc).  In contrast, CloudStack is an eventually consistent
>> distributed system performing automation control.  As such, its requirements
>> plugin requirements are not only very different, but IMHO, much simpler.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com>
>> wrote:
>> 
>> I know this isn't terribly useful, but I've been drawing a lot of squares
>> and circles and lines that connect those squares and circles lately and I
>> have a lot of architectural ideas for CloudStack.  At the rate I'm going it
>> will take me about two weeks to put together a discussion/proposal for the
>> community.  What I'm thinking is a superset of what you've listed out and
>> should align with your idea of a CAR.  The focus has a a lot to do with
>> modularity and extensibility.
>> 
>> So more to come soon....  I will say one thing though, is with java you end
>> up having a hard time doing dynamic load and unloading of modules.  There's
>> plenty of frameworks that try really hard to do this right, like OSGI, but
>> its darn near impossible to do it right because of class loading and GC
>> issues (and that's why Eclipse has you restart after installing plugs even
>> though it is OSGi).
>> 
>> I do believe that CloudStack should be possible of zero downtime maintenance
>> and have ideas around that, but at the end of the day, for plenty of
>> practical reasons, you still need a JVM restart if modules change.
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>> I agree, John - let's get consensus first, then talk time tables.
>> 
>> 
>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>> 
>> Mike,
>> 
>> Before we can dig into timelines or implementations, I think we need to
>> get consensus on the problem to solved and the goals.  Once we have a
>> proper understanding of the scope, I believe we can chunk the across a set
>> of development lifecycle.  The subject is vast, but it also has a far
>> reaching impact to both the storage and network layer evolution efforts.
>> As such, I believe we need to start addressing it as part of the next
>> release.
>> 
>> As a separate thread, we need to discuss the timeline for the next
>> release.  I think we need to avoid the time compression caused by the
>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>> don't think we should consider development of the next release started
>> until the first 4.2 RC is released.  I will try to open a separate discuss
>> thread for this topic, as well as, tying of the discussion of release code
>> names.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>> Hey John,
>> 
>> I think this is some great stuff. Thanks for the write up.
>> 
>> It looks like you have ideas around what might go into a first release of
>> this plug-in framework. Were you thinking we'd have enough time to
>> 
>> squeeze
>> 
>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>> that release for this) because we would only have about five weeks.
>> 
>> Thanks
>> 
>> 
>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>> 
>> wrote:
>> 
>> 
>> All,
>> 
>> In capturing my thoughts on storage, my thinking backed into the driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>> 
>> 
>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>> each have a slightly different model for allowing system
>> 
>> functionality to
>> 
>> be extended/substituted.  These differences increase the barrier of
>> 
>> entry
>> 
>> for vendors seeking to extend CloudStack and accrete code paths to be
>> maintained and verified.
>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>> configuration file.  In addition to being operator unfriendly (most
>> sysadmins are not Spring experts nor do they want to be), we expose
>> 
>> the
>> 
>> core bootstrapping mechanism to operators.  Therefore, a
>> 
>> misconfiguration
>> 
>> could negatively impact the injection/configuration of internal
>> 
>> management
>> 
>> server components.  Essentially handing them a loaded shotgun pointed
>> 
>> at
>> 
>> our right foot.
>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>> mechanism is Spring, the management has little control over the
>> 
>> timing and
>> 
>> order of component loading/unloading.  Changes to the Management
>> 
>> Server's
>> 
>> component dependency graph could break a driver by causing it to be
>> 
>> started
>> 
>> at an unexpected time.
>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>> loaded into the same execution context as core management server
>> components.  Therefore, an errant plugin can corrupt the entire
>> 
>> management
>> 
>> server.
>> 
>> 
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all of
>> 
>> the
>> 
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>> 
>> 
>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>> common state machine and categorization (e.g. network, storage,
>> 
>> hypervisor,
>> 
>> etc) that permits the deterministic calculation of initialization and
>> destruction order (i.e. network layer drivers -> storage layer
>> 
>> drivers ->
>> 
>> hypervisor drivers).  Plugin inter-dependencies would be supported
>> 
>> between
>> 
>> plugins sharing the same category.
>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>> does not require the management server to be restarted.  This
>> 
>> capability
>> 
>> implies a system that supports the simultaneous execution of multiple
>> driver versions and the ability to suspend continued execution work
>> 
>> on a
>> 
>> resource while the underlying driver instance is replaced.
>> - *Execution Isolation*: The deployment packaging and execution
>> environment supports different (and potentially conflicting) versions
>> 
>> of
>> 
>> dependencies to be simultaneously used.  Additionally, plugins would
>> 
>> be
>> 
>> sufficiently sandboxed to protect the management server against driver
>> instability.
>> - *Extension Data Model*: Drivers provide a property bag with a
>> metadata descriptor to validate and render vendor specific data.  The
>> contents of this property bag will provided to every driver operation
>> invocation at runtime.  The metadata descriptor would be a lightweight
>> description that provides a label resource key, a description
>> 
>> resource key,
>> 
>> data type (string, date, number, boolean), required flag, and optional
>> length limit.
>> - *Introspection: Administrative APIs/UIs allow operators to
>> understand the configuration of the drivers in the system, their
>> configuration, and their current state.*
>> - *Discoverability*: Optionally, drivers can be discovered via a
>> project repository definition (similar to Yum) allowing drivers to be
>> remotely acquired and operators to be notified regarding update
>> availability.  The project would also provide, free of charge,
>> 
>> certificates
>> 
>> to sign plugins.  This mechanism would support local mirroring to
>> 
>> support
>> 
>> air gapped management networks.
>> 
>> 
>> Fundamentally, I do not want to turn CloudStack into an erector set with
>> more screws than nuts which is a risk with highly pluggable
>> 
>> architectures.
>> 
>> As such, I think we would need to tightly bound the scope of drivers and
>> their behaviors to prevent the loss system usability and stability.  My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>> 
>> 
>> - META-INF
>>   - MANIFEST.MF
>>   - driver.yaml (driver metadata(e.g. version, name, description,
>>   etc) serialized in YAML format)
>>   - LICENSE (a text file containing the driver's license)
>> - lib (driver dependencies)
>> - classes (driver implementation)
>> - resources (driver message files and potentially JS resources)
>> 
>> 
>> The management server would acquire drivers through a simple scan of a
>> 
>> URL
>> 
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>> management server would create an execution environment (likely a
>> 
>> dedicated
>> 
>> ExecutorService and Classloader), and transition the state of the
>> 
>> driver to
>> 
>> Running (the exact state model would need to be worked out).  To be
>> 
>> really
>> 
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>> create CARs.   I can also imagine an opportunities to add hooks to this
>> model to register instrumentation information with JMX and
>> 
>> authorization.
>> 
>> 
>> To keep the scope of this email confined, we would introduce the general
>> notion of a Resource, and (hand wave hand wave) eventually
>> 
>> compartmentalize
>> 
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely and
>> reliably perform in-place driver upgrades.  For an initial release, I
>> 
>> would
>> 
>> recommend implementing the abstractions, loading mechanism, extension
>> 
>> data
>> 
>> model, and discovery features.  With these capabilities in place, we
>> 
>> could
>> 
>> attack the in-place upgrade model.
>> 
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.
>> 
>> For
>> 
>> example, if a vendor were introducing a new product that required a new
>> 
>> or
>> 
>> updated driver, they would no longer need to wait for a CloudStack
>> 
>> release
>> 
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>> 
>> I have hand waved a number of issues that would need to be resolved
>> 
>> before
>> 
>> such an approach could be implemented.  However, I think we need to
>> 
>> decide,
>> 
>> as a community, that it worth devoting energy and effort to enhancing
>> 
>> the
>> 
>> plugin/driver model and the goals of that effort before driving head
>> 
>> first
>> 
>> into the deep rabbit hole of design/implementation.
>> 
>> Thoughts? (/me ducks)
>> -John
>> 
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> 
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>> 
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*
>> 
>>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Daan Hoogland <da...@gmail.com>.

John,

Nice work.
Given the maturity of OSGi, I'd say lets see how it fits. One criteria
would be can we limit the bundles that may be loaded based on what
Cloudstack supports (and not allow loading pydev) if not we need to
bake our own.

But though I think your work is valuable I disagree on designing our
CARs from the get go without having explored usable options in the
field first. A new type of YARs is not what the world or cloudstack
needs. And given what you have written the main problem wll be finding
a framework we can restrict to what we want, not one that can do all
of it.

done shooting,
Daan

On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
<da...@gmail.com> wrote:
> Sure, I fully understand how it theoretically works, but I'm saying from a
> practical perspective it always seems to fall apart.  What your describing
> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
> allows you to expose services that can be dynamically updated at runtime.
>
> The issues always happens with unloading.  I'll give you a real world
> example.  As part of the servlet spec your supposed to be able to stop and
> unload wars.  But in practice if you do it enough times you typically run
> out of memory.  So one such issue was with commons logging (since fixed).
> When you do getLogger(myclass.class) it would cache a reference of the Class
> object to the actual log impl.  The commons logging jar is typically loaded
> with a system classloader and but MyClass.class would be loaded in the
> webapp classloader.  So when you stop the war there is a reference chain
> system classloader -> logfactory -> Myclass -> webapp classloader.  So the
> web app never gets GC'd.
>
> So just pointing out the practical issues, that's it.
>
> Darren
>
> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>
> Darren,
>
> Actually, loading and unloading aren't difficult if resource management and
> drivers work within the following constraints/assumptions:
>
> Drivers are transient and stateless
> A driver instance is assigned per resource managed (i.e. no singletons)
> A lightweight thread and mailbox (i.e. actor model) are assigned per
> resource managed (outlined in the presentation referenced below)
>
>
> Based on these constraints and assumptions, the following upgrade process
> could be implemented:
>
> Load and verify new driver version to make it available
> Notify the supervisor processes of each affected resource that a new driver
> is available
> Upon completion of the current message being processed by its associated
> actor, the supervisor kills and respawns the actor managing its associated
> resource
> As part of startup, the supervisor injects an instance of the new driver
> version and the actor resumes processing messages in its mailbox
>
>
> This process mirrors the process that would occur on management server
> startup for each resource minus killing an existing actor instance.
> Eventually, the system will upgrade the driver without loss of operation.
> More sophisticated policies could be added, but I think this approach would
> be a solid default upgrade behavior.  As a bonus, this same approach could
> also be applied to global configuration settings -- allowing the system to
> apply changes to these values without restarting the system.
>
> In summary, CloudStack and Eclipse are very different types of systems.
> Eclipse is a desktop application implementing complex workflows, user
> interactions, and management of shared state (e.g. project structure, AST,
> compiler status, etc).  In contrast, CloudStack is an eventually consistent
> distributed system performing automation control.  As such, its requirements
> plugin requirements are not only very different, but IMHO, much simpler.
>
> Thanks,
> -John
>
> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com>
> wrote:
>
> I know this isn't terribly useful, but I've been drawing a lot of squares
> and circles and lines that connect those squares and circles lately and I
> have a lot of architectural ideas for CloudStack.  At the rate I'm going it
> will take me about two weeks to put together a discussion/proposal for the
> community.  What I'm thinking is a superset of what you've listed out and
> should align with your idea of a CAR.  The focus has a a lot to do with
> modularity and extensibility.
>
> So more to come soon....  I will say one thing though, is with java you end
> up having a hard time doing dynamic load and unloading of modules.  There's
> plenty of frameworks that try really hard to do this right, like OSGI, but
> its darn near impossible to do it right because of class loading and GC
> issues (and that's why Eclipse has you restart after installing plugs even
> though it is OSGi).
>
> I do believe that CloudStack should be possible of zero downtime maintenance
> and have ideas around that, but at the end of the day, for plenty of
> practical reasons, you still need a JVM restart if modules change.
>
> Darren
>
> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> I agree, John - let's get consensus first, then talk time tables.
>
>
> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>
> Mike,
>
> Before we can dig into timelines or implementations, I think we need to
> get consensus on the problem to solved and the goals.  Once we have a
> proper understanding of the scope, I believe we can chunk the across a set
> of development lifecycle.  The subject is vast, but it also has a far
> reaching impact to both the storage and network layer evolution efforts.
> As such, I believe we need to start addressing it as part of the next
> release.
>
> As a separate thread, we need to discuss the timeline for the next
> release.  I think we need to avoid the time compression caused by the
> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
> don't think we should consider development of the next release started
> until the first 4.2 RC is released.  I will try to open a separate discuss
> thread for this topic, as well as, tying of the discussion of release code
> names.
>
> Thanks,
> -John
>
> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> Hey John,
>
> I think this is some great stuff. Thanks for the write up.
>
> It looks like you have ideas around what might go into a first release of
> this plug-in framework. Were you thinking we'd have enough time to
>
> squeeze
>
> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> that release for this) because we would only have about five weeks.
>
> Thanks
>
>
> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>
> wrote:
>
>
> All,
>
> In capturing my thoughts on storage, my thinking backed into the driver
> model.  While we have the beginnings of such a model today, I see the
> following deficiencies:
>
>
> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
> each have a slightly different model for allowing system
>
> functionality to
>
> be extended/substituted.  These differences increase the barrier of
>
> entry
>
> for vendors seeking to extend CloudStack and accrete code paths to be
> maintained and verified.
> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
> configuration file.  In addition to being operator unfriendly (most
> sysadmins are not Spring experts nor do they want to be), we expose
>
> the
>
> core bootstrapping mechanism to operators.  Therefore, a
>
> misconfiguration
>
> could negatively impact the injection/configuration of internal
>
> management
>
> server components.  Essentially handing them a loaded shotgun pointed
>
> at
>
> our right foot.
> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
> mechanism is Spring, the management has little control over the
>
> timing and
>
> order of component loading/unloading.  Changes to the Management
>
> Server's
>
> component dependency graph could break a driver by causing it to be
>
> started
>
> at an unexpected time.
> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
> loaded into the same execution context as core management server
> components.  Therefore, an errant plugin can corrupt the entire
>
> management
>
> server.
>
>
> For next revision of the plugin/driver mechanism, I would like see us
> migrate towards a standard pluggable driver model that supports all of
>
> the
>
> management server's extension points (e.g. network devices, storage
> devices, hypervisors, etc) with the following capabilities:
>
>
> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
> common state machine and categorization (e.g. network, storage,
>
> hypervisor,
>
> etc) that permits the deterministic calculation of initialization and
> destruction order (i.e. network layer drivers -> storage layer
>
> drivers ->
>
> hypervisor drivers).  Plugin inter-dependencies would be supported
>
> between
>
> plugins sharing the same category.
> - *In-process Installation and Upgrade*: Adding or upgrading a driver
> does not require the management server to be restarted.  This
>
> capability
>
> implies a system that supports the simultaneous execution of multiple
> driver versions and the ability to suspend continued execution work
>
> on a
>
> resource while the underlying driver instance is replaced.
> - *Execution Isolation*: The deployment packaging and execution
> environment supports different (and potentially conflicting) versions
>
> of
>
> dependencies to be simultaneously used.  Additionally, plugins would
>
> be
>
> sufficiently sandboxed to protect the management server against driver
> instability.
> - *Extension Data Model*: Drivers provide a property bag with a
> metadata descriptor to validate and render vendor specific data.  The
> contents of this property bag will provided to every driver operation
> invocation at runtime.  The metadata descriptor would be a lightweight
> description that provides a label resource key, a description
>
> resource key,
>
> data type (string, date, number, boolean), required flag, and optional
> length limit.
> - *Introspection: Administrative APIs/UIs allow operators to
> understand the configuration of the drivers in the system, their
> configuration, and their current state.*
> - *Discoverability*: Optionally, drivers can be discovered via a
> project repository definition (similar to Yum) allowing drivers to be
> remotely acquired and operators to be notified regarding update
> availability.  The project would also provide, free of charge,
>
> certificates
>
> to sign plugins.  This mechanism would support local mirroring to
>
> support
>
> air gapped management networks.
>
>
> Fundamentally, I do not want to turn CloudStack into an erector set with
> more screws than nuts which is a risk with highly pluggable
>
> architectures.
>
> As such, I think we would need to tightly bound the scope of drivers and
> their behaviors to prevent the loss system usability and stability.  My
> thinking is that drivers would be packaged into a custom JAR, CAR
> (CloudStack ARchive), that would be structured as followed:
>
>
> - META-INF
>    - MANIFEST.MF
>    - driver.yaml (driver metadata(e.g. version, name, description,
>    etc) serialized in YAML format)
>    - LICENSE (a text file containing the driver's license)
> - lib (driver dependencies)
> - classes (driver implementation)
> - resources (driver message files and potentially JS resources)
>
>
> The management server would acquire drivers through a simple scan of a
>
> URL
>
> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
> management server would create an execution environment (likely a
>
> dedicated
>
> ExecutorService and Classloader), and transition the state of the
>
> driver to
>
> Running (the exact state model would need to be worked out).  To be
>
> really
>
> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
> create CARs.   I can also imagine an opportunities to add hooks to this
> model to register instrumentation information with JMX and
>
> authorization.
>
>
> To keep the scope of this email confined, we would introduce the general
> notion of a Resource, and (hand wave hand wave) eventually
>
> compartmentalize
>
> the execution of work around a resource [1].  This (hand waved)
> compartmentalization would allow us the controls necessary to safely and
> reliably perform in-place driver upgrades.  For an initial release, I
>
> would
>
> recommend implementing the abstractions, loading mechanism, extension
>
> data
>
> model, and discovery features.  With these capabilities in place, we
>
> could
>
> attack the in-place upgrade model.
>
> If we were to adopt such a pluggable capability, we would have the
> opportunity to decouple the vendor and CloudStack release schedules.
>
> For
>
> example, if a vendor were introducing a new product that required a new
>
> or
>
> updated driver, they would no longer need to wait for a CloudStack
>
> release
>
> to support it.  They would also gain the ability to fix high priority
> defects in the same manner.
>
> I have hand waved a number of issues that would need to be resolved
>
> before
>
> such an approach could be implemented.  However, I think we need to
>
> decide,
>
> as a community, that it worth devoting energy and effort to enhancing
>
> the
>
> plugin/driver model and the goals of that effort before driving head
>
> first
>
> into the deep rabbit hole of design/implementation.
>
> Thoughts? (/me ducks)
> -John
>
> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*
>
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Amit Das <am...@cloudbyte.com>.

Hi John,

This is really good thinking on the future of cloudstack driver/plugin
architecture.

I would be very happy if we can decouple the ACS releases from vendor
specific releases.

I also agree on the research & experiments (POCs, tools) that needs to be
undertaken to conclude if decoupled driver, driver upgrade, coexistence of
multiple drivers, hot driver deployment, etc. can actually work in JVM.

Note - This approach reminds me of the Vert.x modules & containers that
tries to manage this problem along with other concerns (read
instrumentation, etc.)

Regards,
Amit
*CloudByte Inc.* <http://www.cloudbyte.com/>


On Wed, Aug 21, 2013 at 6:22 AM, Darren Shepherd <
darren.s.shepherd@gmail.com> wrote:

> Sure, I fully understand how it theoretically works, but I'm saying from a
> practical perspective it always seems to fall apart.  What your describing
> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
> allows you to expose services that can be dynamically updated at runtime.
>
> The issues always happens with unloading.  I'll give you a real world
> example.  As part of the servlet spec your supposed to be able to stop and
> unload wars.  But in practice if you do it enough times you typically run
> out of memory.  So one such issue was with commons logging (since fixed).
>  When you do getLogger(myclass.class) it would cache a reference of the
> Class object to the actual log impl.  The commons logging jar is typically
> loaded with a system classloader and but MyClass.class would be loaded in
> the webapp classloader.  So when you stop the war there is a reference
> chain system classloader -> logfactory -> Myclass -> webapp classloader.
>  So the web app never gets GC'd.
>
> So just pointing out the practical issues, that's it.
>
> Darren
>
> On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:
>
> > Darren,
> >
> > Actually, loading and unloading aren't difficult if resource management
> and drivers work within the following constraints/assumptions:
> >
> > Drivers are transient and stateless
> > A driver instance is assigned per resource managed (i.e. no singletons)
> > A lightweight thread and mailbox (i.e. actor model) are assigned per
> resource managed (outlined in the presentation referenced below)
> >
> > Based on these constraints and assumptions, the following upgrade
> process could be implemented:
> >
> > Load and verify new driver version to make it available
> > Notify the supervisor processes of each affected resource that a new
> driver is available
> > Upon completion of the current message being processed by its associated
> actor, the supervisor kills and respawns the actor managing its associated
> resource
> > As part of startup, the supervisor injects an instance of the new driver
> version and the actor resumes processing messages in its mailbox
> >
> > This process mirrors the process that would occur on management server
> startup for each resource minus killing an existing actor instance.
>  Eventually, the system will upgrade the driver without loss of operation.
>  More sophisticated policies could be added, but I think this approach
> would be a solid default upgrade behavior.  As a bonus, this same approach
> could also be applied to global configuration settings -- allowing the
> system to apply changes to these values without restarting the system.
> >
> > In summary, CloudStack and Eclipse are very different types of systems.
>  Eclipse is a desktop application implementing complex workflows, user
> interactions, and management of shared state (e.g. project structure, AST,
> compiler status, etc).  In contrast, CloudStack is an eventually consistent
> distributed system performing automation control.  As such, its
> requirements plugin requirements are not only very different, but IMHO,
> much simpler.
> >
> > Thanks,
> > -John
> >
> > On Aug 20, 2013, at 7:44 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com> wrote:
> >
> >> I know this isn't terribly useful, but I've been drawing a lot of
> squares and circles and lines that connect those squares and circles lately
> and I have a lot of architectural ideas for CloudStack.  At the rate I'm
> going it will take me about two weeks to put together a discussion/proposal
> for the community.  What I'm thinking is a superset of what you've listed
> out and should align with your idea of a CAR.  The focus has a a lot to do
> with modularity and extensibility.
> >>
> >> So more to come soon....  I will say one thing though, is with java you
> end up having a hard time doing dynamic load and unloading of modules.
>  There's plenty of frameworks that try really hard to do this right, like
> OSGI, but its darn near impossible to do it right because of class loading
> and GC issues (and that's why Eclipse has you restart after installing
> plugs even though it is OSGi).
> >>
> >> I do believe that CloudStack should be possible of zero downtime
> maintenance and have ideas around that, but at the end of the day, for
> plenty of practical reasons, you still need a JVM restart if modules change.
> >>
> >> Darren
> >>
> >> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
> >>
> >>> I agree, John - let's get consensus first, then talk time tables.
> >>>
> >>>
> >>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com>
> wrote:
> >>>
> >>>> Mike,
> >>>>
> >>>> Before we can dig into timelines or implementations, I think we need
> to
> >>>> get consensus on the problem to solved and the goals.  Once we have a
> >>>> proper understanding of the scope, I believe we can chunk the across
> a set
> >>>> of development lifecycle.  The subject is vast, but it also has a far
> >>>> reaching impact to both the storage and network layer evolution
> efforts.
> >>>> As such, I believe we need to start addressing it as part of the next
> >>>> release.
> >>>>
> >>>> As a separate thread, we need to discuss the timeline for the next
> >>>> release.  I think we need to avoid the time compression caused by the
> >>>> overlap of the 4.1 stabilization effort and 4.2 development.
>  Therefore, I
> >>>> don't think we should consider development of the next release started
> >>>> until the first 4.2 RC is released.  I will try to open a separate
> discuss
> >>>> thread for this topic, as well as, tying of the discussion of release
> code
> >>>> names.
> >>>>
> >>>> Thanks,
> >>>> -John
> >>>>
> >>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com>
> >>>> wrote:
> >>>>
> >>>>> Hey John,
> >>>>>
> >>>>> I think this is some great stuff. Thanks for the write up.
> >>>>>
> >>>>> It looks like you have ideas around what might go into a first
> release of
> >>>>> this plug-in framework. Were you thinking we'd have enough time to
> >>>> squeeze
> >>>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to
> hit
> >>>>> that release for this) because we would only have about five weeks.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
> >>>> wrote:
> >>>>>
> >>>>>> All,
> >>>>>>
> >>>>>> In capturing my thoughts on storage, my thinking backed into the
> driver
> >>>>>> model.  While we have the beginnings of such a model today, I see
> the
> >>>>>> following deficiencies:
> >>>>>>
> >>>>>>
> >>>>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
> >>>>>> each have a slightly different model for allowing system
> >>>> functionality to
> >>>>>> be extended/substituted.  These differences increase the barrier of
> >>>> entry
> >>>>>> for vendors seeking to extend CloudStack and accrete code paths to
> be
> >>>>>> maintained and verified.
> >>>>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
> >>>>>> configuration file.  In addition to being operator unfriendly (most
> >>>>>> sysadmins are not Spring experts nor do they want to be), we expose
> >>>> the
> >>>>>> core bootstrapping mechanism to operators.  Therefore, a
> >>>> misconfiguration
> >>>>>> could negatively impact the injection/configuration of internal
> >>>> management
> >>>>>> server components.  Essentially handing them a loaded shotgun
> pointed
> >>>> at
> >>>>>> our right foot.
> >>>>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
> >>>>>> mechanism is Spring, the management has little control over the
> >>>> timing and
> >>>>>> order of component loading/unloading.  Changes to the Management
> >>>> Server's
> >>>>>> component dependency graph could break a driver by causing it to be
> >>>> started
> >>>>>> at an unexpected time.
> >>>>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
> >>>>>> loaded into the same execution context as core management server
> >>>>>> components.  Therefore, an errant plugin can corrupt the entire
> >>>> management
> >>>>>> server.
> >>>>>>
> >>>>>>
> >>>>>> For next revision of the plugin/driver mechanism, I would like see
> us
> >>>>>> migrate towards a standard pluggable driver model that supports all
> of
> >>>> the
> >>>>>> management server's extension points (e.g. network devices, storage
> >>>>>> devices, hypervisors, etc) with the following capabilities:
> >>>>>>
> >>>>>>
> >>>>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
> >>>>>> common state machine and categorization (e.g. network, storage,
> >>>> hypervisor,
> >>>>>> etc) that permits the deterministic calculation of initialization
> and
> >>>>>> destruction order (i.e. network layer drivers -> storage layer
> >>>> drivers ->
> >>>>>> hypervisor drivers).  Plugin inter-dependencies would be supported
> >>>> between
> >>>>>> plugins sharing the same category.
> >>>>>> - *In-process Installation and Upgrade*: Adding or upgrading a
> driver
> >>>>>> does not require the management server to be restarted.  This
> >>>> capability
> >>>>>> implies a system that supports the simultaneous execution of
> multiple
> >>>>>> driver versions and the ability to suspend continued execution work
> >>>> on a
> >>>>>> resource while the underlying driver instance is replaced.
> >>>>>> - *Execution Isolation*: The deployment packaging and execution
> >>>>>> environment supports different (and potentially conflicting)
> versions
> >>>> of
> >>>>>> dependencies to be simultaneously used.  Additionally, plugins would
> >>>> be
> >>>>>> sufficiently sandboxed to protect the management server against
> driver
> >>>>>> instability.
> >>>>>> - *Extension Data Model*: Drivers provide a property bag with a
> >>>>>> metadata descriptor to validate and render vendor specific data.
>  The
> >>>>>> contents of this property bag will provided to every driver
> operation
> >>>>>> invocation at runtime.  The metadata descriptor would be a
> lightweight
> >>>>>> description that provides a label resource key, a description
> >>>> resource key,
> >>>>>> data type (string, date, number, boolean), required flag, and
> optional
> >>>>>> length limit.
> >>>>>> - *Introspection: Administrative APIs/UIs allow operators to
> >>>>>> understand the configuration of the drivers in the system, their
> >>>>>> configuration, and their current state.*
> >>>>>> - *Discoverability*: Optionally, drivers can be discovered via a
> >>>>>> project repository definition (similar to Yum) allowing drivers to
> be
> >>>>>> remotely acquired and operators to be notified regarding update
> >>>>>> availability.  The project would also provide, free of charge,
> >>>> certificates
> >>>>>> to sign plugins.  This mechanism would support local mirroring to
> >>>> support
> >>>>>> air gapped management networks.
> >>>>>>
> >>>>>>
> >>>>>> Fundamentally, I do not want to turn CloudStack into an erector set
> with
> >>>>>> more screws than nuts which is a risk with highly pluggable
> >>>> architectures.
> >>>>>> As such, I think we would need to tightly bound the scope of
> drivers and
> >>>>>> their behaviors to prevent the loss system usability and stability.
>  My
> >>>>>> thinking is that drivers would be packaged into a custom JAR, CAR
> >>>>>> (CloudStack ARchive), that would be structured as followed:
> >>>>>>
> >>>>>>
> >>>>>> - META-INF
> >>>>>>    - MANIFEST.MF
> >>>>>>    - driver.yaml (driver metadata(e.g. version, name, description,
> >>>>>>    etc) serialized in YAML format)
> >>>>>>    - LICENSE (a text file containing the driver's license)
> >>>>>> - lib (driver dependencies)
> >>>>>> - classes (driver implementation)
> >>>>>> - resources (driver message files and potentially JS resources)
> >>>>>>
> >>>>>>
> >>>>>> The management server would acquire drivers through a simple scan
> of a
> >>>> URL
> >>>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found,
> the
> >>>>>> management server would create an execution environment (likely a
> >>>> dedicated
> >>>>>> ExecutorService and Classloader), and transition the state of the
> >>>> driver to
> >>>>>> Running (the exact state model would need to be worked out).  To be
> >>>> really
> >>>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin
> to
> >>>>>> create CARs.   I can also imagine an opportunities to add hooks to
> this
> >>>>>> model to register instrumentation information with JMX and
> >>>> authorization.
> >>>>>>
> >>>>>> To keep the scope of this email confined, we would introduce the
> general
> >>>>>> notion of a Resource, and (hand wave hand wave) eventually
> >>>> compartmentalize
> >>>>>> the execution of work around a resource [1].  This (hand waved)
> >>>>>> compartmentalization would allow us the controls necessary to
> safely and
> >>>>>> reliably perform in-place driver upgrades.  For an initial release,
> I
> >>>> would
> >>>>>> recommend implementing the abstractions, loading mechanism,
> extension
> >>>> data
> >>>>>> model, and discovery features.  With these capabilities in place, we
> >>>> could
> >>>>>> attack the in-place upgrade model.
> >>>>>>
> >>>>>> If we were to adopt such a pluggable capability, we would have the
> >>>>>> opportunity to decouple the vendor and CloudStack release schedules.
> >>>> For
> >>>>>> example, if a vendor were introducing a new product that required a
> new
> >>>> or
> >>>>>> updated driver, they would no longer need to wait for a CloudStack
> >>>> release
> >>>>>> to support it.  They would also gain the ability to fix high
> priority
> >>>>>> defects in the same manner.
> >>>>>>
> >>>>>> I have hand waved a number of issues that would need to be resolved
> >>>> before
> >>>>>> such an approach could be implemented.  However, I think we need to
> >>>> decide,
> >>>>>> as a community, that it worth devoting energy and effort to
> enhancing
> >>>> the
> >>>>>> plugin/driver model and the goals of that effort before driving head
> >>>> first
> >>>>>> into the deep rabbit hole of design/implementation.
> >>>>>>
> >>>>>> Thoughts? (/me ducks)
> >>>>>> -John
> >>>>>>
> >>>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
> >>>>
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> *Mike Tutkowski*
> >>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>> e: mike.tutkowski@solidfire.com
> >>>>> o: 303.746.7302
> >>>>> Advancing the way the world uses the
> >>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>>> *™*
> >>>
> >>>
> >>> --
> >>> *Mike Tutkowski*
> >>> *Senior CloudStack Developer, SolidFire Inc.*
> >>> e: mike.tutkowski@solidfire.com
> >>> o: 303.746.7302
> >>> Advancing the way the world uses the
> >>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>> *™*
> >
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Darren Shepherd <da...@gmail.com>.

Sure, I fully understand how it theoretically works, but I'm saying from a practical perspective it always seems to fall apart.  What your describing is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that allows you to expose services that can be dynamically updated at runtime. 

The issues always happens with unloading.  I'll give you a real world example.  As part of the servlet spec your supposed to be able to stop and unload wars.  But in practice if you do it enough times you typically run out of memory.  So one such issue was with commons logging (since fixed).  When you do getLogger(myclass.class) it would cache a reference of the Class object to the actual log impl.  The commons logging jar is typically loaded with a system classloader and but MyClass.class would be loaded in the webapp classloader.  So when you stop the war there is a reference chain system classloader -> logfactory -> Myclass -> webapp classloader.  So the web app never gets GC'd.

So just pointing out the practical issues, that's it.

Darren

On Aug 20, 2013, at 5:31 PM, John Burwell <jb...@basho.com> wrote:

> Darren,
> 
> Actually, loading and unloading aren't difficult if resource management and drivers work within the following constraints/assumptions:
> 
> Drivers are transient and stateless
> A driver instance is assigned per resource managed (i.e. no singletons)
> A lightweight thread and mailbox (i.e. actor model) are assigned per resource managed (outlined in the presentation referenced below)
> 
> Based on these constraints and assumptions, the following upgrade process could be implemented:
> 
> Load and verify new driver version to make it available
> Notify the supervisor processes of each affected resource that a new driver is available
> Upon completion of the current message being processed by its associated actor, the supervisor kills and respawns the actor managing its associated resource 
> As part of startup, the supervisor injects an instance of the new driver version and the actor resumes processing messages in its mailbox
> 
> This process mirrors the process that would occur on management server startup for each resource minus killing an existing actor instance.  Eventually, the system will upgrade the driver without loss of operation.  More sophisticated policies could be added, but I think this approach would be a solid default upgrade behavior.  As a bonus, this same approach could also be applied to global configuration settings -- allowing the system to apply changes to these values without restarting the system.
> 
> In summary, CloudStack and Eclipse are very different types of systems.  Eclipse is a desktop application implementing complex workflows, user interactions, and management of shared state (e.g. project structure, AST, compiler status, etc).  In contrast, CloudStack is an eventually consistent distributed system performing automation control.  As such, its requirements plugin requirements are not only very different, but IMHO, much simpler.
> 
> Thanks,
> -John
> 
> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com> wrote:
> 
>> I know this isn't terribly useful, but I've been drawing a lot of squares and circles and lines that connect those squares and circles lately and I have a lot of architectural ideas for CloudStack.  At the rate I'm going it will take me about two weeks to put together a discussion/proposal for the community.  What I'm thinking is a superset of what you've listed out and should align with your idea of a CAR.  The focus has a a lot to do with modularity and extensibility.  
>> 
>> So more to come soon....  I will say one thing though, is with java you end up having a hard time doing dynamic load and unloading of modules.  There's plenty of frameworks that try really hard to do this right, like OSGI, but its darn near impossible to do it right because of class loading and GC issues (and that's why Eclipse has you restart after installing plugs even though it is OSGi).   
>> 
>> I do believe that CloudStack should be possible of zero downtime maintenance and have ideas around that, but at the end of the day, for plenty of practical reasons, you still need a JVM restart if modules change.   
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com> wrote:
>> 
>>> I agree, John - let's get consensus first, then talk time tables.
>>> 
>>> 
>>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>>> 
>>>> Mike,
>>>> 
>>>> Before we can dig into timelines or implementations, I think we need to
>>>> get consensus on the problem to solved and the goals.  Once we have a
>>>> proper understanding of the scope, I believe we can chunk the across a set
>>>> of development lifecycle.  The subject is vast, but it also has a far
>>>> reaching impact to both the storage and network layer evolution efforts.
>>>> As such, I believe we need to start addressing it as part of the next
>>>> release.
>>>> 
>>>> As a separate thread, we need to discuss the timeline for the next
>>>> release.  I think we need to avoid the time compression caused by the
>>>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>>>> don't think we should consider development of the next release started
>>>> until the first 4.2 RC is released.  I will try to open a separate discuss
>>>> thread for this topic, as well as, tying of the discussion of release code
>>>> names.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>>>> wrote:
>>>> 
>>>>> Hey John,
>>>>> 
>>>>> I think this is some great stuff. Thanks for the write up.
>>>>> 
>>>>> It looks like you have ideas around what might go into a first release of
>>>>> this plug-in framework. Were you thinking we'd have enough time to
>>>> squeeze
>>>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>>>> that release for this) because we would only have about five weeks.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>>>> wrote:
>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> In capturing my thoughts on storage, my thinking backed into the driver
>>>>>> model.  While we have the beginnings of such a model today, I see the
>>>>>> following deficiencies:
>>>>>> 
>>>>>> 
>>>>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>>>>> each have a slightly different model for allowing system
>>>> functionality to
>>>>>> be extended/substituted.  These differences increase the barrier of
>>>> entry
>>>>>> for vendors seeking to extend CloudStack and accrete code paths to be
>>>>>> maintained and verified.
>>>>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>> the
>>>>>> core bootstrapping mechanism to operators.  Therefore, a
>>>> misconfiguration
>>>>>> could negatively impact the injection/configuration of internal
>>>> management
>>>>>> server components.  Essentially handing them a loaded shotgun pointed
>>>> at
>>>>>> our right foot.
>>>>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>>>>> mechanism is Spring, the management has little control over the
>>>> timing and
>>>>>> order of component loading/unloading.  Changes to the Management
>>>> Server's
>>>>>> component dependency graph could break a driver by causing it to be
>>>> started
>>>>>> at an unexpected time.
>>>>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>>>>> loaded into the same execution context as core management server
>>>>>> components.  Therefore, an errant plugin can corrupt the entire
>>>> management
>>>>>> server.
>>>>>> 
>>>>>> 
>>>>>> For next revision of the plugin/driver mechanism, I would like see us
>>>>>> migrate towards a standard pluggable driver model that supports all of
>>>> the
>>>>>> management server's extension points (e.g. network devices, storage
>>>>>> devices, hypervisors, etc) with the following capabilities:
>>>>>> 
>>>>>> 
>>>>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>>>>> common state machine and categorization (e.g. network, storage,
>>>> hypervisor,
>>>>>> etc) that permits the deterministic calculation of initialization and
>>>>>> destruction order (i.e. network layer drivers -> storage layer
>>>> drivers ->
>>>>>> hypervisor drivers).  Plugin inter-dependencies would be supported
>>>> between
>>>>>> plugins sharing the same category.
>>>>>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>>>>> does not require the management server to be restarted.  This
>>>> capability
>>>>>> implies a system that supports the simultaneous execution of multiple
>>>>>> driver versions and the ability to suspend continued execution work
>>>> on a
>>>>>> resource while the underlying driver instance is replaced.
>>>>>> - *Execution Isolation*: The deployment packaging and execution
>>>>>> environment supports different (and potentially conflicting) versions
>>>> of
>>>>>> dependencies to be simultaneously used.  Additionally, plugins would
>>>> be
>>>>>> sufficiently sandboxed to protect the management server against driver
>>>>>> instability.
>>>>>> - *Extension Data Model*: Drivers provide a property bag with a
>>>>>> metadata descriptor to validate and render vendor specific data.  The
>>>>>> contents of this property bag will provided to every driver operation
>>>>>> invocation at runtime.  The metadata descriptor would be a lightweight
>>>>>> description that provides a label resource key, a description
>>>> resource key,
>>>>>> data type (string, date, number, boolean), required flag, and optional
>>>>>> length limit.
>>>>>> - *Introspection: Administrative APIs/UIs allow operators to
>>>>>> understand the configuration of the drivers in the system, their
>>>>>> configuration, and their current state.*
>>>>>> - *Discoverability*: Optionally, drivers can be discovered via a
>>>>>> project repository definition (similar to Yum) allowing drivers to be
>>>>>> remotely acquired and operators to be notified regarding update
>>>>>> availability.  The project would also provide, free of charge,
>>>> certificates
>>>>>> to sign plugins.  This mechanism would support local mirroring to
>>>> support
>>>>>> air gapped management networks.
>>>>>> 
>>>>>> 
>>>>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>>>>> more screws than nuts which is a risk with highly pluggable
>>>> architectures.
>>>>>> As such, I think we would need to tightly bound the scope of drivers and
>>>>>> their behaviors to prevent the loss system usability and stability.  My
>>>>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>>>>> (CloudStack ARchive), that would be structured as followed:
>>>>>> 
>>>>>> 
>>>>>> - META-INF
>>>>>>    - MANIFEST.MF
>>>>>>    - driver.yaml (driver metadata(e.g. version, name, description,
>>>>>>    etc) serialized in YAML format)
>>>>>>    - LICENSE (a text file containing the driver's license)
>>>>>> - lib (driver dependencies)
>>>>>> - classes (driver implementation)
>>>>>> - resources (driver message files and potentially JS resources)
>>>>>> 
>>>>>> 
>>>>>> The management server would acquire drivers through a simple scan of a
>>>> URL
>>>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>>>>> management server would create an execution environment (likely a
>>>> dedicated
>>>>>> ExecutorService and Classloader), and transition the state of the
>>>> driver to
>>>>>> Running (the exact state model would need to be worked out).  To be
>>>> really
>>>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>>>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>>>>> model to register instrumentation information with JMX and
>>>> authorization.
>>>>>> 
>>>>>> To keep the scope of this email confined, we would introduce the general
>>>>>> notion of a Resource, and (hand wave hand wave) eventually
>>>> compartmentalize
>>>>>> the execution of work around a resource [1].  This (hand waved)
>>>>>> compartmentalization would allow us the controls necessary to safely and
>>>>>> reliably perform in-place driver upgrades.  For an initial release, I
>>>> would
>>>>>> recommend implementing the abstractions, loading mechanism, extension
>>>> data
>>>>>> model, and discovery features.  With these capabilities in place, we
>>>> could
>>>>>> attack the in-place upgrade model.
>>>>>> 
>>>>>> If we were to adopt such a pluggable capability, we would have the
>>>>>> opportunity to decouple the vendor and CloudStack release schedules.
>>>> For
>>>>>> example, if a vendor were introducing a new product that required a new
>>>> or
>>>>>> updated driver, they would no longer need to wait for a CloudStack
>>>> release
>>>>>> to support it.  They would also gain the ability to fix high priority
>>>>>> defects in the same manner.
>>>>>> 
>>>>>> I have hand waved a number of issues that would need to be resolved
>>>> before
>>>>>> such an approach could be implemented.  However, I think we need to
>>>> decide,
>>>>>> as a community, that it worth devoting energy and effort to enhancing
>>>> the
>>>>>> plugin/driver model and the goals of that effort before driving head
>>>> first
>>>>>> into the deep rabbit hole of design/implementation.
>>>>>> 
>>>>>> Thoughts? (/me ducks)
>>>>>> -John
>>>>>> 
>>>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>>>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> *Mike Tutkowski*
>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>> e: mike.tutkowski@solidfire.com
>>>>> o: 303.746.7302
>>>>> Advancing the way the world uses the
>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>> *™*
>>> 
>>> 
>>> -- 
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Darren,

My response does hand wave two important issues -- hot code reloading and PermGen leakage.  These are tricky, but well trod issues that can be solved in variety of ways (e.g. instrumentation, class loaders, OSGi).  It would require a some research/experimentation to determine the best approach particularly when using a lightweight threading model. 

Thanks,
-John

On Aug 20, 2013, at 8:31 PM, John Burwell <jb...@basho.com> wrote:

> Darren,
> 
> Actually, loading and unloading aren't difficult if resource management and drivers work within the following constraints/assumptions:
> 
> Drivers are transient and stateless
> A driver instance is assigned per resource managed (i.e. no singletons)
> A lightweight thread and mailbox (i.e. actor model) are assigned per resource managed (outlined in the presentation referenced below)
> 
> Based on these constraints and assumptions, the following upgrade process could be implemented:
> 
> Load and verify new driver version to make it available
> Notify the supervisor processes of each affected resource that a new driver is available
> Upon completion of the current message being processed by its associated actor, the supervisor kills and respawns the actor managing its associated resource 
> As part of startup, the supervisor injects an instance of the new driver version and the actor resumes processing messages in its mailbox
> 
> This process mirrors the process that would occur on management server startup for each resource minus killing an existing actor instance.  Eventually, the system will upgrade the driver without loss of operation.  More sophisticated policies could be added, but I think this approach would be a solid default upgrade behavior.  As a bonus, this same approach could also be applied to global configuration settings -- allowing the system to apply changes to these values without restarting the system.
> 
> In summary, CloudStack and Eclipse are very different types of systems.  Eclipse is a desktop application implementing complex workflows, user interactions, and management of shared state (e.g. project structure, AST, compiler status, etc).  In contrast, CloudStack is an eventually consistent distributed system performing automation control.  As such, its requirements plugin requirements are not only very different, but IMHO, much simpler.
> 
> Thanks,
> -John
> 
> On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com> wrote:
> 
>> I know this isn't terribly useful, but I've been drawing a lot of squares and circles and lines that connect those squares and circles lately and I have a lot of architectural ideas for CloudStack.  At the rate I'm going it will take me about two weeks to put together a discussion/proposal for the community.  What I'm thinking is a superset of what you've listed out and should align with your idea of a CAR.  The focus has a a lot to do with modularity and extensibility.  
>> 
>> So more to come soon....  I will say one thing though, is with java you end up having a hard time doing dynamic load and unloading of modules.  There's plenty of frameworks that try really hard to do this right, like OSGI, but its darn near impossible to do it right because of class loading and GC issues (and that's why Eclipse has you restart after installing plugs even though it is OSGi).   
>> 
>> I do believe that CloudStack should be possible of zero downtime maintenance and have ideas around that, but at the end of the day, for plenty of practical reasons, you still need a JVM restart if modules change.   
>> 
>> Darren
>> 
>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com> wrote:
>> 
>>> I agree, John - let's get consensus first, then talk time tables.
>>> 
>>> 
>>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>>> 
>>>> Mike,
>>>> 
>>>> Before we can dig into timelines or implementations, I think we need to
>>>> get consensus on the problem to solved and the goals.  Once we have a
>>>> proper understanding of the scope, I believe we can chunk the across a set
>>>> of development lifecycle.  The subject is vast, but it also has a far
>>>> reaching impact to both the storage and network layer evolution efforts.
>>>> As such, I believe we need to start addressing it as part of the next
>>>> release.
>>>> 
>>>> As a separate thread, we need to discuss the timeline for the next
>>>> release.  I think we need to avoid the time compression caused by the
>>>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>>>> don't think we should consider development of the next release started
>>>> until the first 4.2 RC is released.  I will try to open a separate discuss
>>>> thread for this topic, as well as, tying of the discussion of release code
>>>> names.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>>>> wrote:
>>>> 
>>>>> Hey John,
>>>>> 
>>>>> I think this is some great stuff. Thanks for the write up.
>>>>> 
>>>>> It looks like you have ideas around what might go into a first release of
>>>>> this plug-in framework. Were you thinking we'd have enough time to
>>>> squeeze
>>>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>>>> that release for this) because we would only have about five weeks.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>>>> wrote:
>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> In capturing my thoughts on storage, my thinking backed into the driver
>>>>>> model.  While we have the beginnings of such a model today, I see the
>>>>>> following deficiencies:
>>>>>> 
>>>>>> 
>>>>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>>>>> each have a slightly different model for allowing system
>>>> functionality to
>>>>>> be extended/substituted.  These differences increase the barrier of
>>>> entry
>>>>>> for vendors seeking to extend CloudStack and accrete code paths to be
>>>>>> maintained and verified.
>>>>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>>> the
>>>>>> core bootstrapping mechanism to operators.  Therefore, a
>>>> misconfiguration
>>>>>> could negatively impact the injection/configuration of internal
>>>> management
>>>>>> server components.  Essentially handing them a loaded shotgun pointed
>>>> at
>>>>>> our right foot.
>>>>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>>>>> mechanism is Spring, the management has little control over the
>>>> timing and
>>>>>> order of component loading/unloading.  Changes to the Management
>>>> Server's
>>>>>> component dependency graph could break a driver by causing it to be
>>>> started
>>>>>> at an unexpected time.
>>>>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>>>>> loaded into the same execution context as core management server
>>>>>> components.  Therefore, an errant plugin can corrupt the entire
>>>> management
>>>>>> server.
>>>>>> 
>>>>>> 
>>>>>> For next revision of the plugin/driver mechanism, I would like see us
>>>>>> migrate towards a standard pluggable driver model that supports all of
>>>> the
>>>>>> management server's extension points (e.g. network devices, storage
>>>>>> devices, hypervisors, etc) with the following capabilities:
>>>>>> 
>>>>>> 
>>>>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>>>>> common state machine and categorization (e.g. network, storage,
>>>> hypervisor,
>>>>>> etc) that permits the deterministic calculation of initialization and
>>>>>> destruction order (i.e. network layer drivers -> storage layer
>>>> drivers ->
>>>>>> hypervisor drivers).  Plugin inter-dependencies would be supported
>>>> between
>>>>>> plugins sharing the same category.
>>>>>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>>>>> does not require the management server to be restarted.  This
>>>> capability
>>>>>> implies a system that supports the simultaneous execution of multiple
>>>>>> driver versions and the ability to suspend continued execution work
>>>> on a
>>>>>> resource while the underlying driver instance is replaced.
>>>>>> - *Execution Isolation*: The deployment packaging and execution
>>>>>> environment supports different (and potentially conflicting) versions
>>>> of
>>>>>> dependencies to be simultaneously used.  Additionally, plugins would
>>>> be
>>>>>> sufficiently sandboxed to protect the management server against driver
>>>>>> instability.
>>>>>> - *Extension Data Model*: Drivers provide a property bag with a
>>>>>> metadata descriptor to validate and render vendor specific data.  The
>>>>>> contents of this property bag will provided to every driver operation
>>>>>> invocation at runtime.  The metadata descriptor would be a lightweight
>>>>>> description that provides a label resource key, a description
>>>> resource key,
>>>>>> data type (string, date, number, boolean), required flag, and optional
>>>>>> length limit.
>>>>>> - *Introspection: Administrative APIs/UIs allow operators to
>>>>>> understand the configuration of the drivers in the system, their
>>>>>> configuration, and their current state.*
>>>>>> - *Discoverability*: Optionally, drivers can be discovered via a
>>>>>> project repository definition (similar to Yum) allowing drivers to be
>>>>>> remotely acquired and operators to be notified regarding update
>>>>>> availability.  The project would also provide, free of charge,
>>>> certificates
>>>>>> to sign plugins.  This mechanism would support local mirroring to
>>>> support
>>>>>> air gapped management networks.
>>>>>> 
>>>>>> 
>>>>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>>>>> more screws than nuts which is a risk with highly pluggable
>>>> architectures.
>>>>>> As such, I think we would need to tightly bound the scope of drivers and
>>>>>> their behaviors to prevent the loss system usability and stability.  My
>>>>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>>>>> (CloudStack ARchive), that would be structured as followed:
>>>>>> 
>>>>>> 
>>>>>> - META-INF
>>>>>>    - MANIFEST.MF
>>>>>>    - driver.yaml (driver metadata(e.g. version, name, description,
>>>>>>    etc) serialized in YAML format)
>>>>>>    - LICENSE (a text file containing the driver's license)
>>>>>> - lib (driver dependencies)
>>>>>> - classes (driver implementation)
>>>>>> - resources (driver message files and potentially JS resources)
>>>>>> 
>>>>>> 
>>>>>> The management server would acquire drivers through a simple scan of a
>>>> URL
>>>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>>>>> management server would create an execution environment (likely a
>>>> dedicated
>>>>>> ExecutorService and Classloader), and transition the state of the
>>>> driver to
>>>>>> Running (the exact state model would need to be worked out).  To be
>>>> really
>>>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>>>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>>>>> model to register instrumentation information with JMX and
>>>> authorization.
>>>>>> 
>>>>>> To keep the scope of this email confined, we would introduce the general
>>>>>> notion of a Resource, and (hand wave hand wave) eventually
>>>> compartmentalize
>>>>>> the execution of work around a resource [1].  This (hand waved)
>>>>>> compartmentalization would allow us the controls necessary to safely and
>>>>>> reliably perform in-place driver upgrades.  For an initial release, I
>>>> would
>>>>>> recommend implementing the abstractions, loading mechanism, extension
>>>> data
>>>>>> model, and discovery features.  With these capabilities in place, we
>>>> could
>>>>>> attack the in-place upgrade model.
>>>>>> 
>>>>>> If we were to adopt such a pluggable capability, we would have the
>>>>>> opportunity to decouple the vendor and CloudStack release schedules.
>>>> For
>>>>>> example, if a vendor were introducing a new product that required a new
>>>> or
>>>>>> updated driver, they would no longer need to wait for a CloudStack
>>>> release
>>>>>> to support it.  They would also gain the ability to fix high priority
>>>>>> defects in the same manner.
>>>>>> 
>>>>>> I have hand waved a number of issues that would need to be resolved
>>>> before
>>>>>> such an approach could be implemented.  However, I think we need to
>>>> decide,
>>>>>> as a community, that it worth devoting energy and effort to enhancing
>>>> the
>>>>>> plugin/driver model and the goals of that effort before driving head
>>>> first
>>>>>> into the deep rabbit hole of design/implementation.
>>>>>> 
>>>>>> Thoughts? (/me ducks)
>>>>>> -John
>>>>>> 
>>>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>>>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> *Mike Tutkowski*
>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>> e: mike.tutkowski@solidfire.com
>>>>> o: 303.746.7302
>>>>> Advancing the way the world uses the
>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>> *™*
>>> 
>>> 
>>> -- 
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
>

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Darren,

Actually, loading and unloading aren't difficult if resource management and drivers work within the following constraints/assumptions:

Drivers are transient and stateless
A driver instance is assigned per resource managed (i.e. no singletons)
A lightweight thread and mailbox (i.e. actor model) are assigned per resource managed (outlined in the presentation referenced below)

Based on these constraints and assumptions, the following upgrade process could be implemented:

Load and verify new driver version to make it available
Notify the supervisor processes of each affected resource that a new driver is available
Upon completion of the current message being processed by its associated actor, the supervisor kills and respawns the actor managing its associated resource 
As part of startup, the supervisor injects an instance of the new driver version and the actor resumes processing messages in its mailbox

This process mirrors the process that would occur on management server startup for each resource minus killing an existing actor instance.  Eventually, the system will upgrade the driver without loss of operation.  More sophisticated policies could be added, but I think this approach would be a solid default upgrade behavior.  As a bonus, this same approach could also be applied to global configuration settings -- allowing the system to apply changes to these values without restarting the system.

In summary, CloudStack and Eclipse are very different types of systems.  Eclipse is a desktop application implementing complex workflows, user interactions, and management of shared state (e.g. project structure, AST, compiler status, etc).  In contrast, CloudStack is an eventually consistent distributed system performing automation control.  As such, its requirements plugin requirements are not only very different, but IMHO, much simpler.

Thanks,
-John

On Aug 20, 2013, at 7:44 PM, Darren Shepherd <da...@gmail.com> wrote:

> I know this isn't terribly useful, but I've been drawing a lot of squares and circles and lines that connect those squares and circles lately and I have a lot of architectural ideas for CloudStack.  At the rate I'm going it will take me about two weeks to put together a discussion/proposal for the community.  What I'm thinking is a superset of what you've listed out and should align with your idea of a CAR.  The focus has a a lot to do with modularity and extensibility.  
> 
> So more to come soon....  I will say one thing though, is with java you end up having a hard time doing dynamic load and unloading of modules.  There's plenty of frameworks that try really hard to do this right, like OSGI, but its darn near impossible to do it right because of class loading and GC issues (and that's why Eclipse has you restart after installing plugs even though it is OSGi).   
> 
> I do believe that CloudStack should be possible of zero downtime maintenance and have ideas around that, but at the end of the day, for plenty of practical reasons, you still need a JVM restart if modules change.   
> 
> Darren
> 
> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com> wrote:
> 
>> I agree, John - let's get consensus first, then talk time tables.
>> 
>> 
>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
>> 
>>> Mike,
>>> 
>>> Before we can dig into timelines or implementations, I think we need to
>>> get consensus on the problem to solved and the goals.  Once we have a
>>> proper understanding of the scope, I believe we can chunk the across a set
>>> of development lifecycle.  The subject is vast, but it also has a far
>>> reaching impact to both the storage and network layer evolution efforts.
>>> As such, I believe we need to start addressing it as part of the next
>>> release.
>>> 
>>> As a separate thread, we need to discuss the timeline for the next
>>> release.  I think we need to avoid the time compression caused by the
>>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>>> don't think we should consider development of the next release started
>>> until the first 4.2 RC is released.  I will try to open a separate discuss
>>> thread for this topic, as well as, tying of the discussion of release code
>>> names.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>>> wrote:
>>> 
>>>> Hey John,
>>>> 
>>>> I think this is some great stuff. Thanks for the write up.
>>>> 
>>>> It looks like you have ideas around what might go into a first release of
>>>> this plug-in framework. Were you thinking we'd have enough time to
>>> squeeze
>>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>>> that release for this) because we would only have about five weeks.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>>> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> In capturing my thoughts on storage, my thinking backed into the driver
>>>>> model.  While we have the beginnings of such a model today, I see the
>>>>> following deficiencies:
>>>>> 
>>>>> 
>>>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>>>> each have a slightly different model for allowing system
>>> functionality to
>>>>> be extended/substituted.  These differences increase the barrier of
>>> entry
>>>>> for vendors seeking to extend CloudStack and accrete code paths to be
>>>>> maintained and verified.
>>>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>>>> configuration file.  In addition to being operator unfriendly (most
>>>>> sysadmins are not Spring experts nor do they want to be), we expose
>>> the
>>>>> core bootstrapping mechanism to operators.  Therefore, a
>>> misconfiguration
>>>>> could negatively impact the injection/configuration of internal
>>> management
>>>>> server components.  Essentially handing them a loaded shotgun pointed
>>> at
>>>>> our right foot.
>>>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>>>> mechanism is Spring, the management has little control over the
>>> timing and
>>>>> order of component loading/unloading.  Changes to the Management
>>> Server's
>>>>> component dependency graph could break a driver by causing it to be
>>> started
>>>>> at an unexpected time.
>>>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>>>> loaded into the same execution context as core management server
>>>>> components.  Therefore, an errant plugin can corrupt the entire
>>> management
>>>>> server.
>>>>> 
>>>>> 
>>>>> For next revision of the plugin/driver mechanism, I would like see us
>>>>> migrate towards a standard pluggable driver model that supports all of
>>> the
>>>>> management server's extension points (e.g. network devices, storage
>>>>> devices, hypervisors, etc) with the following capabilities:
>>>>> 
>>>>> 
>>>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>>>> common state machine and categorization (e.g. network, storage,
>>> hypervisor,
>>>>> etc) that permits the deterministic calculation of initialization and
>>>>> destruction order (i.e. network layer drivers -> storage layer
>>> drivers ->
>>>>> hypervisor drivers).  Plugin inter-dependencies would be supported
>>> between
>>>>> plugins sharing the same category.
>>>>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>>>> does not require the management server to be restarted.  This
>>> capability
>>>>> implies a system that supports the simultaneous execution of multiple
>>>>> driver versions and the ability to suspend continued execution work
>>> on a
>>>>> resource while the underlying driver instance is replaced.
>>>>> - *Execution Isolation*: The deployment packaging and execution
>>>>> environment supports different (and potentially conflicting) versions
>>> of
>>>>> dependencies to be simultaneously used.  Additionally, plugins would
>>> be
>>>>> sufficiently sandboxed to protect the management server against driver
>>>>> instability.
>>>>> - *Extension Data Model*: Drivers provide a property bag with a
>>>>> metadata descriptor to validate and render vendor specific data.  The
>>>>> contents of this property bag will provided to every driver operation
>>>>> invocation at runtime.  The metadata descriptor would be a lightweight
>>>>> description that provides a label resource key, a description
>>> resource key,
>>>>> data type (string, date, number, boolean), required flag, and optional
>>>>> length limit.
>>>>> - *Introspection: Administrative APIs/UIs allow operators to
>>>>> understand the configuration of the drivers in the system, their
>>>>> configuration, and their current state.*
>>>>> - *Discoverability*: Optionally, drivers can be discovered via a
>>>>> project repository definition (similar to Yum) allowing drivers to be
>>>>> remotely acquired and operators to be notified regarding update
>>>>> availability.  The project would also provide, free of charge,
>>> certificates
>>>>> to sign plugins.  This mechanism would support local mirroring to
>>> support
>>>>> air gapped management networks.
>>>>> 
>>>>> 
>>>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>>>> more screws than nuts which is a risk with highly pluggable
>>> architectures.
>>>>> As such, I think we would need to tightly bound the scope of drivers and
>>>>> their behaviors to prevent the loss system usability and stability.  My
>>>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>>>> (CloudStack ARchive), that would be structured as followed:
>>>>> 
>>>>> 
>>>>> - META-INF
>>>>>    - MANIFEST.MF
>>>>>    - driver.yaml (driver metadata(e.g. version, name, description,
>>>>>    etc) serialized in YAML format)
>>>>>    - LICENSE (a text file containing the driver's license)
>>>>> - lib (driver dependencies)
>>>>> - classes (driver implementation)
>>>>> - resources (driver message files and potentially JS resources)
>>>>> 
>>>>> 
>>>>> The management server would acquire drivers through a simple scan of a
>>> URL
>>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>>>> management server would create an execution environment (likely a
>>> dedicated
>>>>> ExecutorService and Classloader), and transition the state of the
>>> driver to
>>>>> Running (the exact state model would need to be worked out).  To be
>>> really
>>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>>>> model to register instrumentation information with JMX and
>>> authorization.
>>>>> 
>>>>> To keep the scope of this email confined, we would introduce the general
>>>>> notion of a Resource, and (hand wave hand wave) eventually
>>> compartmentalize
>>>>> the execution of work around a resource [1].  This (hand waved)
>>>>> compartmentalization would allow us the controls necessary to safely and
>>>>> reliably perform in-place driver upgrades.  For an initial release, I
>>> would
>>>>> recommend implementing the abstractions, loading mechanism, extension
>>> data
>>>>> model, and discovery features.  With these capabilities in place, we
>>> could
>>>>> attack the in-place upgrade model.
>>>>> 
>>>>> If we were to adopt such a pluggable capability, we would have the
>>>>> opportunity to decouple the vendor and CloudStack release schedules.
>>> For
>>>>> example, if a vendor were introducing a new product that required a new
>>> or
>>>>> updated driver, they would no longer need to wait for a CloudStack
>>> release
>>>>> to support it.  They would also gain the ability to fix high priority
>>>>> defects in the same manner.
>>>>> 
>>>>> I have hand waved a number of issues that would need to be resolved
>>> before
>>>>> such an approach could be implemented.  However, I think we need to
>>> decide,
>>>>> as a community, that it worth devoting energy and effort to enhancing
>>> the
>>>>> plugin/driver model and the goals of that effort before driving head
>>> first
>>>>> into the deep rabbit hole of design/implementation.
>>>>> 
>>>>> Thoughts? (/me ducks)
>>>>> -John
>>>>> 
>>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Mike Tutkowski*
>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>> e: mike.tutkowski@solidfire.com
>>>> o: 303.746.7302
>>>> Advancing the way the world uses the
>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>> *™*
>> 
>> 
>> -- 
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Darren Shepherd <da...@gmail.com>.

I know this isn't terribly useful, but I've been drawing a lot of squares and circles and lines that connect those squares and circles lately and I have a lot of architectural ideas for CloudStack.  At the rate I'm going it will take me about two weeks to put together a discussion/proposal for the community.  What I'm thinking is a superset of what you've listed out and should align with your idea of a CAR.  The focus has a a lot to do with modularity and extensibility.  

So more to come soon....  I will say one thing though, is with java you end up having a hard time doing dynamic load and unloading of modules.  There's plenty of frameworks that try really hard to do this right, like OSGI, but its darn near impossible to do it right because of class loading and GC issues (and that's why Eclipse has you restart after installing plugs even though it is OSGi).   

I do believe that CloudStack should be possible of zero downtime maintenance and have ideas around that, but at the end of the day, for plenty of practical reasons, you still need a JVM restart if modules change.   

Darren

On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mi...@solidfire.com> wrote:

> I agree, John - let's get consensus first, then talk time tables.
> 
> 
> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:
> 
>> Mike,
>> 
>> Before we can dig into timelines or implementations, I think we need to
>> get consensus on the problem to solved and the goals.  Once we have a
>> proper understanding of the scope, I believe we can chunk the across a set
>> of development lifecycle.  The subject is vast, but it also has a far
>> reaching impact to both the storage and network layer evolution efforts.
>> As such, I believe we need to start addressing it as part of the next
>> release.
>> 
>> As a separate thread, we need to discuss the timeline for the next
>> release.  I think we need to avoid the time compression caused by the
>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>> don't think we should consider development of the next release started
>> until the first 4.2 RC is released.  I will try to open a separate discuss
>> thread for this topic, as well as, tying of the discussion of release code
>> names.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>>> Hey John,
>>> 
>>> I think this is some great stuff. Thanks for the write up.
>>> 
>>> It looks like you have ideas around what might go into a first release of
>>> this plug-in framework. Were you thinking we'd have enough time to
>> squeeze
>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>> that release for this) because we would only have about five weeks.
>>> 
>>> Thanks
>>> 
>>> 
>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
>> wrote:
>>> 
>>>> All,
>>>> 
>>>> In capturing my thoughts on storage, my thinking backed into the driver
>>>> model.  While we have the beginnings of such a model today, I see the
>>>> following deficiencies:
>>>> 
>>>> 
>>>>  1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>>>  each have a slightly different model for allowing system
>> functionality to
>>>>  be extended/substituted.  These differences increase the barrier of
>> entry
>>>>  for vendors seeking to extend CloudStack and accrete code paths to be
>>>>  maintained and verified.
>>>>  2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>>>  configuration file.  In addition to being operator unfriendly (most
>>>>  sysadmins are not Spring experts nor do they want to be), we expose
>> the
>>>>  core bootstrapping mechanism to operators.  Therefore, a
>> misconfiguration
>>>>  could negatively impact the injection/configuration of internal
>> management
>>>>  server components.  Essentially handing them a loaded shotgun pointed
>> at
>>>>  our right foot.
>>>>  3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>>>  mechanism is Spring, the management has little control over the
>> timing and
>>>>  order of component loading/unloading.  Changes to the Management
>> Server's
>>>>  component dependency graph could break a driver by causing it to be
>> started
>>>>  at an unexpected time.
>>>>  4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>>>  loaded into the same execution context as core management server
>>>>  components.  Therefore, an errant plugin can corrupt the entire
>> management
>>>>  server.
>>>> 
>>>> 
>>>> For next revision of the plugin/driver mechanism, I would like see us
>>>> migrate towards a standard pluggable driver model that supports all of
>> the
>>>> management server's extension points (e.g. network devices, storage
>>>> devices, hypervisors, etc) with the following capabilities:
>>>> 
>>>> 
>>>>  - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>>>  common state machine and categorization (e.g. network, storage,
>> hypervisor,
>>>>  etc) that permits the deterministic calculation of initialization and
>>>>  destruction order (i.e. network layer drivers -> storage layer
>> drivers ->
>>>>  hypervisor drivers).  Plugin inter-dependencies would be supported
>> between
>>>>  plugins sharing the same category.
>>>>  - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>>>  does not require the management server to be restarted.  This
>> capability
>>>>  implies a system that supports the simultaneous execution of multiple
>>>>  driver versions and the ability to suspend continued execution work
>> on a
>>>>  resource while the underlying driver instance is replaced.
>>>>  - *Execution Isolation*: The deployment packaging and execution
>>>>  environment supports different (and potentially conflicting) versions
>> of
>>>>  dependencies to be simultaneously used.  Additionally, plugins would
>> be
>>>>  sufficiently sandboxed to protect the management server against driver
>>>>  instability.
>>>>  - *Extension Data Model*: Drivers provide a property bag with a
>>>>  metadata descriptor to validate and render vendor specific data.  The
>>>>  contents of this property bag will provided to every driver operation
>>>>  invocation at runtime.  The metadata descriptor would be a lightweight
>>>>  description that provides a label resource key, a description
>> resource key,
>>>>  data type (string, date, number, boolean), required flag, and optional
>>>>  length limit.
>>>>  - *Introspection: Administrative APIs/UIs allow operators to
>>>>  understand the configuration of the drivers in the system, their
>>>>  configuration, and their current state.*
>>>>  - *Discoverability*: Optionally, drivers can be discovered via a
>>>>  project repository definition (similar to Yum) allowing drivers to be
>>>>  remotely acquired and operators to be notified regarding update
>>>>  availability.  The project would also provide, free of charge,
>> certificates
>>>>  to sign plugins.  This mechanism would support local mirroring to
>> support
>>>>  air gapped management networks.
>>>> 
>>>> 
>>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>>> more screws than nuts which is a risk with highly pluggable
>> architectures.
>>>> As such, I think we would need to tightly bound the scope of drivers and
>>>> their behaviors to prevent the loss system usability and stability.  My
>>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>>> (CloudStack ARchive), that would be structured as followed:
>>>> 
>>>> 
>>>>  - META-INF
>>>>     - MANIFEST.MF
>>>>     - driver.yaml (driver metadata(e.g. version, name, description,
>>>>     etc) serialized in YAML format)
>>>>     - LICENSE (a text file containing the driver's license)
>>>>  - lib (driver dependencies)
>>>>  - classes (driver implementation)
>>>>  - resources (driver message files and potentially JS resources)
>>>> 
>>>> 
>>>> The management server would acquire drivers through a simple scan of a
>> URL
>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>>> management server would create an execution environment (likely a
>> dedicated
>>>> ExecutorService and Classloader), and transition the state of the
>> driver to
>>>> Running (the exact state model would need to be worked out).  To be
>> really
>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>>> model to register instrumentation information with JMX and
>> authorization.
>>>> 
>>>> To keep the scope of this email confined, we would introduce the general
>>>> notion of a Resource, and (hand wave hand wave) eventually
>> compartmentalize
>>>> the execution of work around a resource [1].  This (hand waved)
>>>> compartmentalization would allow us the controls necessary to safely and
>>>> reliably perform in-place driver upgrades.  For an initial release, I
>> would
>>>> recommend implementing the abstractions, loading mechanism, extension
>> data
>>>> model, and discovery features.  With these capabilities in place, we
>> could
>>>> attack the in-place upgrade model.
>>>> 
>>>> If we were to adopt such a pluggable capability, we would have the
>>>> opportunity to decouple the vendor and CloudStack release schedules.
>> For
>>>> example, if a vendor were introducing a new product that required a new
>> or
>>>> updated driver, they would no longer need to wait for a CloudStack
>> release
>>>> to support it.  They would also gain the ability to fix high priority
>>>> defects in the same manner.
>>>> 
>>>> I have hand waved a number of issues that would need to be resolved
>> before
>>>> such an approach could be implemented.  However, I think we need to
>> decide,
>>>> as a community, that it worth devoting energy and effort to enhancing
>> the
>>>> plugin/driver model and the goals of that effort before driving head
>> first
>>>> into the deep rabbit hole of design/implementation.
>>>> 
>>>> Thoughts? (/me ducks)
>>>> -John
>>>> 
>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>> 
>>> 
>>> 
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Mike Tutkowski <mi...@solidfire.com>.

I agree, John - let's get consensus first, then talk time tables.


On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jb...@basho.com> wrote:

> Mike,
>
> Before we can dig into timelines or implementations, I think we need to
> get consensus on the problem to solved and the goals.  Once we have a
> proper understanding of the scope, I believe we can chunk the across a set
> of development lifecycle.  The subject is vast, but it also has a far
> reaching impact to both the storage and network layer evolution efforts.
>  As such, I believe we need to start addressing it as part of the next
> release.
>
> As a separate thread, we need to discuss the timeline for the next
> release.  I think we need to avoid the time compression caused by the
> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
> don't think we should consider development of the next release started
> until the first 4.2 RC is released.  I will try to open a separate discuss
> thread for this topic, as well as, tying of the discussion of release code
> names.
>
> Thanks,
> -John
>
> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> > Hey John,
> >
> > I think this is some great stuff. Thanks for the write up.
> >
> > It looks like you have ideas around what might go into a first release of
> > this plug-in framework. Were you thinking we'd have enough time to
> squeeze
> > that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> > that release for this) because we would only have about five weeks.
> >
> > Thanks
> >
> >
> > On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com>
> wrote:
> >
> >> All,
> >>
> >> In capturing my thoughts on storage, my thinking backed into the driver
> >> model.  While we have the beginnings of such a model today, I see the
> >> following deficiencies:
> >>
> >>
> >>   1. *Multiple Models*: The Storage, Hypervisor, and Security layers
> >>   each have a slightly different model for allowing system
> functionality to
> >>   be extended/substituted.  These differences increase the barrier of
> entry
> >>   for vendors seeking to extend CloudStack and accrete code paths to be
> >>   maintained and verified.
> >>   2. *Leaky Abstraction*:  Plugins are registered through a Spring
> >>   configuration file.  In addition to being operator unfriendly (most
> >>   sysadmins are not Spring experts nor do they want to be), we expose
> the
> >>   core bootstrapping mechanism to operators.  Therefore, a
> misconfiguration
> >>   could negatively impact the injection/configuration of internal
> management
> >>   server components.  Essentially handing them a loaded shotgun pointed
> at
> >>   our right foot.
> >>   3. *Nondeterministic Load/Unload Model*:  Because the core loading
> >>   mechanism is Spring, the management has little control over the
> timing and
> >>   order of component loading/unloading.  Changes to the Management
> Server's
> >>   component dependency graph could break a driver by causing it to be
> started
> >>   at an unexpected time.
> >>   4. *Lack of Execution Isolation*: As a Spring component, plugins are
> >>   loaded into the same execution context as core management server
> >>   components.  Therefore, an errant plugin can corrupt the entire
> management
> >>   server.
> >>
> >>
> >> For next revision of the plugin/driver mechanism, I would like see us
> >> migrate towards a standard pluggable driver model that supports all of
> the
> >> management server's extension points (e.g. network devices, storage
> >> devices, hypervisors, etc) with the following capabilities:
> >>
> >>
> >>   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
> >>   common state machine and categorization (e.g. network, storage,
> hypervisor,
> >>   etc) that permits the deterministic calculation of initialization and
> >>   destruction order (i.e. network layer drivers -> storage layer
> drivers ->
> >>   hypervisor drivers).  Plugin inter-dependencies would be supported
> between
> >>   plugins sharing the same category.
> >>   - *In-process Installation and Upgrade*: Adding or upgrading a driver
> >>   does not require the management server to be restarted.  This
> capability
> >>   implies a system that supports the simultaneous execution of multiple
> >>   driver versions and the ability to suspend continued execution work
> on a
> >>   resource while the underlying driver instance is replaced.
> >>   - *Execution Isolation*: The deployment packaging and execution
> >>   environment supports different (and potentially conflicting) versions
> of
> >>   dependencies to be simultaneously used.  Additionally, plugins would
> be
> >>   sufficiently sandboxed to protect the management server against driver
> >>   instability.
> >>   - *Extension Data Model*: Drivers provide a property bag with a
> >>   metadata descriptor to validate and render vendor specific data.  The
> >>   contents of this property bag will provided to every driver operation
> >>   invocation at runtime.  The metadata descriptor would be a lightweight
> >>   description that provides a label resource key, a description
> resource key,
> >>   data type (string, date, number, boolean), required flag, and optional
> >>   length limit.
> >>   - *Introspection: Administrative APIs/UIs allow operators to
> >>   understand the configuration of the drivers in the system, their
> >>   configuration, and their current state.*
> >>   - *Discoverability*: Optionally, drivers can be discovered via a
> >>   project repository definition (similar to Yum) allowing drivers to be
> >>   remotely acquired and operators to be notified regarding update
> >>   availability.  The project would also provide, free of charge,
> certificates
> >>   to sign plugins.  This mechanism would support local mirroring to
> support
> >>   air gapped management networks.
> >>
> >>
> >> Fundamentally, I do not want to turn CloudStack into an erector set with
> >> more screws than nuts which is a risk with highly pluggable
> architectures.
> >> As such, I think we would need to tightly bound the scope of drivers and
> >> their behaviors to prevent the loss system usability and stability.  My
> >> thinking is that drivers would be packaged into a custom JAR, CAR
> >> (CloudStack ARchive), that would be structured as followed:
> >>
> >>
> >>   - META-INF
> >>      - MANIFEST.MF
> >>      - driver.yaml (driver metadata(e.g. version, name, description,
> >>      etc) serialized in YAML format)
> >>      - LICENSE (a text file containing the driver's license)
> >>   - lib (driver dependencies)
> >>   - classes (driver implementation)
> >>   - resources (driver message files and potentially JS resources)
> >>
> >>
> >> The management server would acquire drivers through a simple scan of a
> URL
> >> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
> >> management server would create an execution environment (likely a
> dedicated
> >> ExecutorService and Classloader), and transition the state of the
> driver to
> >> Running (the exact state model would need to be worked out).  To be
> really
> >> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
> >> create CARs.   I can also imagine an opportunities to add hooks to this
> >> model to register instrumentation information with JMX and
> authorization.
> >>
> >> To keep the scope of this email confined, we would introduce the general
> >> notion of a Resource, and (hand wave hand wave) eventually
> compartmentalize
> >> the execution of work around a resource [1].  This (hand waved)
> >> compartmentalization would allow us the controls necessary to safely and
> >> reliably perform in-place driver upgrades.  For an initial release, I
> would
> >> recommend implementing the abstractions, loading mechanism, extension
> data
> >> model, and discovery features.  With these capabilities in place, we
> could
> >> attack the in-place upgrade model.
> >>
> >> If we were to adopt such a pluggable capability, we would have the
> >> opportunity to decouple the vendor and CloudStack release schedules.
>  For
> >> example, if a vendor were introducing a new product that required a new
> or
> >> updated driver, they would no longer need to wait for a CloudStack
> release
> >> to support it.  They would also gain the ability to fix high priority
> >> defects in the same manner.
> >>
> >> I have hand waved a number of issues that would need to be resolved
> before
> >> such an approach could be implemented.  However, I think we need to
> decide,
> >> as a community, that it worth devoting energy and effort to enhancing
> the
> >> plugin/driver model and the goals of that effort before driving head
> first
> >> into the deep rabbit hole of design/implementation.
> >>
> >> Thoughts? (/me ducks)
> >> -John
> >>
> >> [1]: My opinions on the matter from CloudStack Collab 2013 ->
> >>
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
> >>
> >
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *™*
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by John Burwell <jb...@basho.com>.

Mike,

Before we can dig into timelines or implementations, I think we need to get consensus on the problem to solved and the goals.  Once we have a proper understanding of the scope, I believe we can chunk the across a set of development lifecycle.  The subject is vast, but it also has a far reaching impact to both the storage and network layer evolution efforts.  As such, I believe we need to start addressing it as part of the next release.  

As a separate thread, we need to discuss the timeline for the next release.  I think we need to avoid the time compression caused by the overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I don't think we should consider development of the next release started until the first 4.2 RC is released.  I will try to open a separate discuss thread for this topic, as well as, tying of the discussion of release code names.

Thanks,
-John

On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mi...@solidfire.com> wrote:

> Hey John,
> 
> I think this is some great stuff. Thanks for the write up.
> 
> It looks like you have ideas around what might go into a first release of
> this plug-in framework. Were you thinking we'd have enough time to squeeze
> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> that release for this) because we would only have about five weeks.
> 
> Thanks
> 
> 
> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com> wrote:
> 
>> All,
>> 
>> In capturing my thoughts on storage, my thinking backed into the driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>> 
>> 
>>   1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>   each have a slightly different model for allowing system functionality to
>>   be extended/substituted.  These differences increase the barrier of entry
>>   for vendors seeking to extend CloudStack and accrete code paths to be
>>   maintained and verified.
>>   2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>   configuration file.  In addition to being operator unfriendly (most
>>   sysadmins are not Spring experts nor do they want to be), we expose the
>>   core bootstrapping mechanism to operators.  Therefore, a misconfiguration
>>   could negatively impact the injection/configuration of internal management
>>   server components.  Essentially handing them a loaded shotgun pointed at
>>   our right foot.
>>   3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>   mechanism is Spring, the management has little control over the timing and
>>   order of component loading/unloading.  Changes to the Management Server's
>>   component dependency graph could break a driver by causing it to be started
>>   at an unexpected time.
>>   4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>   loaded into the same execution context as core management server
>>   components.  Therefore, an errant plugin can corrupt the entire management
>>   server.
>> 
>> 
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all of the
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>> 
>> 
>>   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>   common state machine and categorization (e.g. network, storage, hypervisor,
>>   etc) that permits the deterministic calculation of initialization and
>>   destruction order (i.e. network layer drivers -> storage layer drivers ->
>>   hypervisor drivers).  Plugin inter-dependencies would be supported between
>>   plugins sharing the same category.
>>   - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>   does not require the management server to be restarted.  This capability
>>   implies a system that supports the simultaneous execution of multiple
>>   driver versions and the ability to suspend continued execution work on a
>>   resource while the underlying driver instance is replaced.
>>   - *Execution Isolation*: The deployment packaging and execution
>>   environment supports different (and potentially conflicting) versions of
>>   dependencies to be simultaneously used.  Additionally, plugins would be
>>   sufficiently sandboxed to protect the management server against driver
>>   instability.
>>   - *Extension Data Model*: Drivers provide a property bag with a
>>   metadata descriptor to validate and render vendor specific data.  The
>>   contents of this property bag will provided to every driver operation
>>   invocation at runtime.  The metadata descriptor would be a lightweight
>>   description that provides a label resource key, a description resource key,
>>   data type (string, date, number, boolean), required flag, and optional
>>   length limit.
>>   - *Introspection: Administrative APIs/UIs allow operators to
>>   understand the configuration of the drivers in the system, their
>>   configuration, and their current state.*
>>   - *Discoverability*: Optionally, drivers can be discovered via a
>>   project repository definition (similar to Yum) allowing drivers to be
>>   remotely acquired and operators to be notified regarding update
>>   availability.  The project would also provide, free of charge, certificates
>>   to sign plugins.  This mechanism would support local mirroring to support
>>   air gapped management networks.
>> 
>> 
>> Fundamentally, I do not want to turn CloudStack into an erector set with
>> more screws than nuts which is a risk with highly pluggable architectures.
>> As such, I think we would need to tightly bound the scope of drivers and
>> their behaviors to prevent the loss system usability and stability.  My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>> 
>> 
>>   - META-INF
>>      - MANIFEST.MF
>>      - driver.yaml (driver metadata(e.g. version, name, description,
>>      etc) serialized in YAML format)
>>      - LICENSE (a text file containing the driver's license)
>>   - lib (driver dependencies)
>>   - classes (driver implementation)
>>   - resources (driver message files and potentially JS resources)
>> 
>> 
>> The management server would acquire drivers through a simple scan of a URL
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>> management server would create an execution environment (likely a dedicated
>> ExecutorService and Classloader), and transition the state of the driver to
>> Running (the exact state model would need to be worked out).  To be really
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>> create CARs.   I can also imagine an opportunities to add hooks to this
>> model to register instrumentation information with JMX and authorization.
>> 
>> To keep the scope of this email confined, we would introduce the general
>> notion of a Resource, and (hand wave hand wave) eventually compartmentalize
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely and
>> reliably perform in-place driver upgrades.  For an initial release, I would
>> recommend implementing the abstractions, loading mechanism, extension data
>> model, and discovery features.  With these capabilities in place, we could
>> attack the in-place upgrade model.
>> 
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.  For
>> example, if a vendor were introducing a new product that required a new or
>> updated driver, they would no longer need to wait for a CloudStack release
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>> 
>> I have hand waved a number of issues that would need to be resolved before
>> such an approach could be implemented.  However, I think we need to decide,
>> as a community, that it worth devoting energy and effort to enhancing the
>> plugin/driver model and the goals of that effort before driving head first
>> into the deep rabbit hole of design/implementation.
>> 
>> Thoughts? (/me ducks)
>> -John
>> 
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>> 
> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Posted by Mike Tutkowski <mi...@solidfire.com>.

Hey John,

I think this is some great stuff. Thanks for the write up.

It looks like you have ideas around what might go into a first release of
this plug-in framework. Were you thinking we'd have enough time to squeeze
that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
that release for this) because we would only have about five weeks.

Thanks


On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jb...@basho.com> wrote:

> All,
>
> In capturing my thoughts on storage, my thinking backed into the driver
> model.  While we have the beginnings of such a model today, I see the
> following deficiencies:
>
>
>    1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>    each have a slightly different model for allowing system functionality to
>    be extended/substituted.  These differences increase the barrier of entry
>    for vendors seeking to extend CloudStack and accrete code paths to be
>    maintained and verified.
>    2. *Leaky Abstraction*:  Plugins are registered through a Spring
>    configuration file.  In addition to being operator unfriendly (most
>    sysadmins are not Spring experts nor do they want to be), we expose the
>    core bootstrapping mechanism to operators.  Therefore, a misconfiguration
>    could negatively impact the injection/configuration of internal management
>    server components.  Essentially handing them a loaded shotgun pointed at
>    our right foot.
>    3. *Nondeterministic Load/Unload Model*:  Because the core loading
>    mechanism is Spring, the management has little control over the timing and
>    order of component loading/unloading.  Changes to the Management Server's
>    component dependency graph could break a driver by causing it to be started
>    at an unexpected time.
>    4. *Lack of Execution Isolation*: As a Spring component, plugins are
>    loaded into the same execution context as core management server
>    components.  Therefore, an errant plugin can corrupt the entire management
>    server.
>
>
> For next revision of the plugin/driver mechanism, I would like see us
> migrate towards a standard pluggable driver model that supports all of the
> management server's extension points (e.g. network devices, storage
> devices, hypervisors, etc) with the following capabilities:
>
>
>    - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>    common state machine and categorization (e.g. network, storage, hypervisor,
>    etc) that permits the deterministic calculation of initialization and
>    destruction order (i.e. network layer drivers -> storage layer drivers ->
>    hypervisor drivers).  Plugin inter-dependencies would be supported between
>    plugins sharing the same category.
>    - *In-process Installation and Upgrade*: Adding or upgrading a driver
>    does not require the management server to be restarted.  This capability
>    implies a system that supports the simultaneous execution of multiple
>    driver versions and the ability to suspend continued execution work on a
>    resource while the underlying driver instance is replaced.
>    - *Execution Isolation*: The deployment packaging and execution
>    environment supports different (and potentially conflicting) versions of
>    dependencies to be simultaneously used.  Additionally, plugins would be
>    sufficiently sandboxed to protect the management server against driver
>    instability.
>    - *Extension Data Model*: Drivers provide a property bag with a
>    metadata descriptor to validate and render vendor specific data.  The
>    contents of this property bag will provided to every driver operation
>    invocation at runtime.  The metadata descriptor would be a lightweight
>    description that provides a label resource key, a description resource key,
>    data type (string, date, number, boolean), required flag, and optional
>    length limit.
>    - *Introspection: Administrative APIs/UIs allow operators to
>    understand the configuration of the drivers in the system, their
>    configuration, and their current state.*
>    - *Discoverability*: Optionally, drivers can be discovered via a
>    project repository definition (similar to Yum) allowing drivers to be
>    remotely acquired and operators to be notified regarding update
>    availability.  The project would also provide, free of charge, certificates
>    to sign plugins.  This mechanism would support local mirroring to support
>    air gapped management networks.
>
>
> Fundamentally, I do not want to turn CloudStack into an erector set with
> more screws than nuts which is a risk with highly pluggable architectures.
>  As such, I think we would need to tightly bound the scope of drivers and
> their behaviors to prevent the loss system usability and stability.  My
> thinking is that drivers would be packaged into a custom JAR, CAR
> (CloudStack ARchive), that would be structured as followed:
>
>
>    - META-INF
>       - MANIFEST.MF
>       - driver.yaml (driver metadata(e.g. version, name, description,
>       etc) serialized in YAML format)
>       - LICENSE (a text file containing the driver's license)
>    - lib (driver dependencies)
>    - classes (driver implementation)
>    - resources (driver message files and potentially JS resources)
>
>
> The management server would acquire drivers through a simple scan of a URL
> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
> management server would create an execution environment (likely a dedicated
> ExecutorService and Classloader), and transition the state of the driver to
> Running (the exact state model would need to be worked out).  To be really
> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
> create CARs.   I can also imagine an opportunities to add hooks to this
> model to register instrumentation information with JMX and authorization.
>
> To keep the scope of this email confined, we would introduce the general
> notion of a Resource, and (hand wave hand wave) eventually compartmentalize
> the execution of work around a resource [1].  This (hand waved)
> compartmentalization would allow us the controls necessary to safely and
> reliably perform in-place driver upgrades.  For an initial release, I would
> recommend implementing the abstractions, loading mechanism, extension data
> model, and discovery features.  With these capabilities in place, we could
> attack the in-place upgrade model.
>
> If we were to adopt such a pluggable capability, we would have the
> opportunity to decouple the vendor and CloudStack release schedules.  For
> example, if a vendor were introducing a new product that required a new or
> updated driver, they would no longer need to wait for a CloudStack release
> to support it.  They would also gain the ability to fix high priority
> defects in the same manner.
>
> I have hand waved a number of issues that would need to be resolved before
> such an approach could be implemented.  However, I think we need to decide,
> as a community, that it worth devoting energy and effort to enhancing the
> plugin/driver model and the goals of that effort before driving head first
> into the deep rabbit hole of design/implementation.
>
> Thoughts? (/me ducks)
> -John
>
> [1]: My opinions on the matter from CloudStack Collab 2013 ->
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*