You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Mike Tutkowski <mi...@solidfire.com> on 2013/10/09 22:24:45 UTC

Fwd: [DISCUSS/PROPOSAL] Upgrading Driver Model

Hey Chris,

This e-mail chain might be of interest to you.

Talk to you later,
Mike

---------- Forwarded message ----------
From: John Burwell <jb...@basho.com>
Date: Tue, Aug 20, 2013 at 3:43 PM
Subject: [DISCUSS/PROPOSAL] Upgrading Driver Model
To: "dev@cloudstack.apache.org" <de...@cloudstack.apache.org>
Cc: Daan Hoogland <da...@gmail.com>, Hugo Trippaers <
htrippaers@schubergphilis.com>, "La Motta, David" <Da...@netapp.com>


All,

In capturing my thoughts on storage, my thinking backed into the driver
model.  While we have the beginnings of such a model today, I see the
following deficiencies:


   1. *Multiple Models*: The Storage, Hypervisor, and Security layers each
   have a slightly different model for allowing system functionality to be
   extended/substituted.  These differences increase the barrier of entry for
   vendors seeking to extend CloudStack and accrete code paths to be
   maintained and verified.
   2. *Leaky Abstraction*:  Plugins are registered through a Spring
   configuration file.  In addition to being operator unfriendly (most
   sysadmins are not Spring experts nor do they want to be), we expose the
   core bootstrapping mechanism to operators.  Therefore, a misconfiguration
   could negatively impact the injection/configuration of internal management
   server components.  Essentially handing them a loaded shotgun pointed at
   our right foot.
   3. *Nondeterministic Load/Unload Model*:  Because the core loading
   mechanism is Spring, the management has little control over the timing and
   order of component loading/unloading.  Changes to the Management Server's
   component dependency graph could break a driver by causing it to be started
   at an unexpected time.
   4. *Lack of Execution Isolation*: As a Spring component, plugins are
   loaded into the same execution context as core management server
   components.  Therefore, an errant plugin can corrupt the entire management
   server.


For next revision of the plugin/driver mechanism, I would like see us
migrate towards a standard pluggable driver model that supports all of the
management server's extension points (e.g. network devices, storage
devices, hypervisors, etc) with the following capabilities:


   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
   common state machine and categorization (e.g. network, storage, hypervisor,
   etc) that permits the deterministic calculation of initialization and
   destruction order (i.e. network layer drivers -> storage layer drivers ->
   hypervisor drivers).  Plugin inter-dependencies would be supported between
   plugins sharing the same category.
   - *In-process Installation and Upgrade*: Adding or upgrading a driver
   does not require the management server to be restarted.  This capability
   implies a system that supports the simultaneous execution of multiple
   driver versions and the ability to suspend continued execution work on a
   resource while the underlying driver instance is replaced.
   - *Execution Isolation*: The deployment packaging and execution
   environment supports different (and potentially conflicting) versions of
   dependencies to be simultaneously used.  Additionally, plugins would be
   sufficiently sandboxed to protect the management server against driver
   instability.
   - *Extension Data Model*: Drivers provide a property bag with a metadata
   descriptor to validate and render vendor specific data.  The contents of
   this property bag will provided to every driver operation invocation at
   runtime.  The metadata descriptor would be a lightweight description that
   provides a label resource key, a description resource key, data type
   (string, date, number, boolean), required flag, and optional length limit.
   - *Introspection: Administrative APIs/UIs allow operators to understand
   the configuration of the drivers in the system, their configuration, and
   their current state.*
   - *Discoverability*: Optionally, drivers can be discovered via a project
   repository definition (similar to Yum) allowing drivers to be remotely
   acquired and operators to be notified regarding update availability.  The
   project would also provide, free of charge, certificates to sign plugins.
    This mechanism would support local mirroring to support air gapped
   management networks.


Fundamentally, I do not want to turn CloudStack into an erector set with
more screws than nuts which is a risk with highly pluggable architectures.
 As such, I think we would need to tightly bound the scope of drivers and
their behaviors to prevent the loss system usability and stability.  My
thinking is that drivers would be packaged into a custom JAR, CAR
(CloudStack ARchive), that would be structured as followed:


   - META-INF
      - MANIFEST.MF
      - driver.yaml (driver metadata(e.g. version, name, description, etc)
      serialized in YAML format)
      - LICENSE (a text file containing the driver's license)
   - lib (driver dependencies)
   - classes (driver implementation)
   - resources (driver message files and potentially JS resources)


The management server would acquire drivers through a simple scan of a URL
(e.g. file directory, S3 bucket, etc).  For every CAR object found, the
management server would create an execution environment (likely a dedicated
ExecutorService and Classloader), and transition the state of the driver to
Running (the exact state model would need to be worked out).  To be really
nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
create CARs.   I can also imagine an opportunities to add hooks to this
model to register instrumentation information with JMX and authorization.

To keep the scope of this email confined, we would introduce the general
notion of a Resource, and (hand wave hand wave) eventually compartmentalize
the execution of work around a resource [1].  This (hand waved)
compartmentalization would allow us the controls necessary to safely and
reliably perform in-place driver upgrades.  For an initial release, I would
recommend implementing the abstractions, loading mechanism, extension data
model, and discovery features.  With these capabilities in place, we could
attack the in-place upgrade model.

If we were to adopt such a pluggable capability, we would have the
opportunity to decouple the vendor and CloudStack release schedules.  For
example, if a vendor were introducing a new product that required a new or
updated driver, they would no longer need to wait for a CloudStack release
to support it.  They would also gain the ability to fix high priority
defects in the same manner.

I have hand waved a number of issues that would need to be resolved before
such an approach could be implemented.  However, I think we need to decide,
as a community, that it worth devoting energy and effort to enhancing the
plugin/driver model and the goals of that effort before driving head first
into the deep rabbit hole of design/implementation.

Thoughts? (/me ducks)
-John

[1]: My opinions on the matter from CloudStack Collab 2013 ->
http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*