You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@whirr.apache.org by John Conwell <jo...@iamjohn.me> on 2011/10/05 23:58:15 UTC

Whirr's roll in an entire cloud based architecture deployment

Hey guys,
Here are some thoughts I've been kicking around lately about whirr.

I've been using whirr fairly extensively since 0.4.0.  At first my needs
started off fairly simple, requiring only a single hadoop cluster.  Then
things got a bit more complex and I needed three different clusters (hadoop,
solr, cassandra), so I started using whirr's API, and built a bit of
automation around it.  And now my requirements have gotten fairly complex,
where I have 7 different kinds of clusters being created, and 3 times that
many post cluster launch steps to authorize ingress from one cluster to
another, run custom configuration scripts, copy required files to the
clusters, etc.

And this has brought me to the question, what do you think whirrs roll
should be when it comes to complex, interdependent cloud based architecture
deployment?  Whirr is really good at creating a single cluster of
non-dependent resources, meaning its good at creating a cluster of VMs dont
require any upstream dependencies in order for it to be used.  And this is
fine as long as there are no external dependencies.  But what about
deployment scenarios where there are N different types of clusters, and
where the configuration of one cluster is dependent on makeup of a previous
cluster?  Also, what about other kinds of deployment steps, like configuring
custom fire wall rules, or executing custom setup scripts.

For example, the scenario that I'm in the process of automating creates the
following clusters: hadoop, cassandra, solr, zookeeper, activemq, haproxy,
and two different tomcat clusters.  Then there are cluster to cluster
ingress rules I need to set, as well as a few ip address to cluster rules.
 But thats not the worst of it.  In order to fully configure our tomcat
servers for example, I need to know things like the ip addresses of the
cassandra, hadoop, solr, and activemq nodes.  So I've got custom steps that
gather this info and call runScriptOnNodesMatching on the tomcat cluster.
 Then there are external files that need to get put in certain clusters,
like custom solr config and schema files.  These I download form a
blobstore, again triggered from a script executed
by runScriptOnNodesMatching.

So in order to fully support complex cloud base deployments there are a set
of actions that need to get stitched together to execute is a specified
order in order to allow downstream dependencies to get info about up stream
deployment actions: launch cluster action, remote script action, cluster
ingress action, ip ingress action, file upload action, blob file upload,
etc, all hopefully driven by one configuration file that can define the
entire set of complex interdependent deployment actions.

Thoughts?

-- 

Thanks,
John C

Re: Whirr's roll in an entire cloud based architecture deployment

Posted by Andrei Savu <sa...@gmail.com>.

John -

I'm really happy to hear that you are using Whirr for a while to deploy
complex work flows and it matched your needs.

I think that before going into supporting more complex deployments with
multiple clusters we need to do a really good job at deploying and managing
a single cluster and there is still an nontrivial amount of work that we
need to do for this (e.g. deterministic cluster configuration behavior, good
error reporting, add / remove node from running clusters, improved support
for setting firewall rules, overall user experience improvements etc.)

I also think that this matches the vision of Whirr as a library that you can
use to deploy more complex scenarios as part of your application.

Thanks,

-- Andrei Savu / andreisavu.ro

On Thu, Oct 6, 2011 at 12:58 AM, John Conwell <jo...@iamjohn.me> wrote:

> Hey guys,
> Here are some thoughts I've been kicking around lately about whirr.
>
> I've been using whirr fairly extensively since 0.4.0.  At first my needs
> started off fairly simple, requiring only a single hadoop cluster.  Then
> things got a bit more complex and I needed three different clusters (hadoop,
> solr, cassandra), so I started using whirr's API, and built a bit of
> automation around it.  And now my requirements have gotten fairly complex,
> where I have 7 different kinds of clusters being created, and 3 times that
> many post cluster launch steps to authorize ingress from one cluster to
> another, run custom configuration scripts, copy required files to the
> clusters, etc.
>
> And this has brought me to the question, what do you think whirrs roll
> should be when it comes to complex, interdependent cloud based architecture
> deployment?  Whirr is really good at creating a single cluster of
> non-dependent resources, meaning its good at creating a cluster of VMs dont
> require any upstream dependencies in order for it to be used.  And this is
> fine as long as there are no external dependencies.  But what about
> deployment scenarios where there are N different types of clusters, and
> where the configuration of one cluster is dependent on makeup of a previous
> cluster?  Also, what about other kinds of deployment steps, like configuring
> custom fire wall rules, or executing custom setup scripts.
>
> For example, the scenario that I'm in the process of automating creates the
> following clusters: hadoop, cassandra, solr, zookeeper, activemq, haproxy,
> and two different tomcat clusters.  Then there are cluster to cluster
> ingress rules I need to set, as well as a few ip address to cluster rules.
>  But thats not the worst of it.  In order to fully configure our tomcat
> servers for example, I need to know things like the ip addresses of the
> cassandra, hadoop, solr, and activemq nodes.  So I've got custom steps that
> gather this info and call runScriptOnNodesMatching on the tomcat cluster.
>  Then there are external files that need to get put in certain clusters,
> like custom solr config and schema files.  These I download form a
> blobstore, again triggered from a script executed
> by runScriptOnNodesMatching.
>
> So in order to fully support complex cloud base deployments there are a set
> of actions that need to get stitched together to execute is a specified
> order in order to allow downstream dependencies to get info about up stream
> deployment actions: launch cluster action, remote script action, cluster
> ingress action, ip ingress action, file upload action, blob file upload,
> etc, all hopefully driven by one configuration file that can define the
> entire set of complex interdependent deployment actions.
>
> Thoughts?
>
> --
>
> Thanks,
> John C
>
>