You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by Andrei Savu <sa...@gmail.com> on 2012/12/14 16:34:09 UTC

Provisioning as a Dedicated Service

Hi guys,

There is no secret that at Axemblr we are using Apache Whirr for
provisioning and initial basic cluster configuration for Hadoop. As soon as
the machines are running we configure Hadoop by leveraging APIs from
existing tools like Cloudera Manager or Ambari.

All the orchestration needed to make this happen is not trivial if you want
the final system to be predictable, robust, restartable and easy to inspect
while running.

A few months ago we've realised that we need to re-work the machine
provisioning layer from Whirr and build a system that has the following
features:

* should be able to provision 10s or 100s of virtual machines by doing a
good job at handling API throttling and by using batch operations as much
as possible

* all the internal workflows should be persistent and as granular as
possible and each step should be idempotent

* it should be possible to restart the application server while starting
virtual machines with no impact

* it should have a modular architecture and provide enough flexibility to
be able to work with a large number of public and private clouds just by
replacing modules

* it should hide all this complexity behind a simple REST API and a simple
interactive shell

* it should be able to automatically build gold base images and use the to
spawn large clusters

We've spent some time looking for existing products that do all this and in
the end we've decided that it's better to start from scratch and build this
system as a new project based on Activiti, Apache Karaf, jclouds and native
sdks.

The source code is now publicly available at:

https://github.com/axemblr/axemblr-provisionr

I would really like to know what you think about the work we've done so
far. The project will improve a lot over the next couple of weeks / months
so I encourage you to stay tunned.

We want to bring this project to the Apache Foundation later on. I will
give a talk in february at ApacheCon NA on this.

Cheers,

-- Andrei Savu / axemblr.com

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Fri, Dec 14, 2012 at 7:27 PM, Ioannis Canellos <io...@gmail.com> wrote:

> It looks really sexy!
> I really like the selection and combination of activiti + karaf.
>

Thanks Ioannis!

Re: Provisioning as a Dedicated Service

Posted by Ioannis Canellos <io...@gmail.com>.
It looks really sexy!
I really like the selection and combination of activiti + karaf.

-- 
*Ioannis Canellos*
*

**
Blog: http://iocanel.blogspot.com
**
Twitter: iocanel
*

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Tue, Dec 18, 2012 at 8:46 PM, Steve Loughran <st...@hortonworks.com>wrote:

> Fancy giving a demo on a google+ hangout? I'm sure I'm not the only person
> who'd love to see it in action
>

Let's do this in January as soon as 0.0.1 is out and the basic
functionality is in place.

Re: Provisioning as a Dedicated Service

Posted by Steve Loughran <st...@hortonworks.com>.
sure, I've signed up!

On 25 January 2013 06:42, Andrei Savu <sa...@gmail.com> wrote:

> On Tue, Dec 18, 2012 at 8:46 PM, Steve Loughran <stevel@hortonworks.com
> >wrote:
>
> > Fancy giving a demo on a google+ hangout? I'm sure I'm not the only
> person
> > who'd love to see it in action
> >
>
> How about doing this on Monday?
>
> https://plus.google.com/events/cfbq948jr1nqrqrdb41mg02in88
>
> We are now working on 0.3.0. You can find more informations here on
> previous releases:
> https://github.com/axemblr/axemblr-provisionr/wiki
>
> Cheers,
>
> -- Andrei Savu
>

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Tue, Dec 18, 2012 at 8:46 PM, Steve Loughran <st...@hortonworks.com>wrote:

> Fancy giving a demo on a google+ hangout? I'm sure I'm not the only person
> who'd love to see it in action
>

How about doing this on Monday?

https://plus.google.com/events/cfbq948jr1nqrqrdb41mg02in88

We are now working on 0.3.0. You can find more informations here on
previous releases:
https://github.com/axemblr/axemblr-provisionr/wiki

Cheers,

-- Andrei Savu

Re: Provisioning as a Dedicated Service

Posted by Steve Loughran <st...@hortonworks.com>.
On 14 December 2012 15:34, Andrei Savu <sa...@gmail.com> wrote:

>
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
>
Fancy giving a demo on a google+ hangout? I'm sure I'm not the only person
who'd love to see it in action

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Mon, Dec 17, 2012 at 9:52 PM, Steve Loughran <st...@hortonworks.com>wrote:

> I'm impressed!


Thanks Steve! We are planning to use this service to dynamically build a
golden image with HDP binaries and use the Ambari API for configuration  to
create on-demand cluster with 50+ nodes. I hope to have all this ready for
Hadoop Summit Amsterdam (hopefully my proposal [1] will make it on the
conference agenda).

[1]
http://hadoopsummit2013.uservoice.com/forums/185446-operating-hadoop/suggestions/3401646-hadoop-on-demand-on-cloudstack


-- Andrei Savu

Re: Provisioning as a Dedicated Service

Posted by Steve Loughran <st...@hortonworks.com>.
I'm impressed!

On 14 December 2012 15:34, Andrei Savu <sa...@gmail.com> wrote:

> Hi guys,
>
> There is no secret that at Axemblr we are using Apache Whirr for
> provisioning and initial basic cluster configuration for Hadoop. As soon as
> the machines are running we configure Hadoop by leveraging APIs from
> existing tools like Cloudera Manager or Ambari.
>
> All the orchestration needed to make this happen is not trivial if you want
> the final system to be predictable, robust, restartable and easy to inspect
> while running.
>
> A few months ago we've realised that we need to re-work the machine
> provisioning layer from Whirr and build a system that has the following
> features:
>
> * should be able to provision 10s or 100s of virtual machines by doing a
> good job at handling API throttling and by using batch operations as much
> as possible
>
> * all the internal workflows should be persistent and as granular as
> possible and each step should be idempotent
>
> * it should be possible to restart the application server while starting
> virtual machines with no impact
>
> * it should have a modular architecture and provide enough flexibility to
> be able to work with a large number of public and private clouds just by
> replacing modules
>
> * it should hide all this complexity behind a simple REST API and a simple
> interactive shell
>
> * it should be able to automatically build gold base images and use the to
> spawn large clusters
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
> Cheers,
>
> -- Andrei Savu / axemblr.com
>

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Sat, Dec 15, 2012 at 3:24 PM, Karel Vervaeke <ka...@ngdata.com> wrote:

> Sound like a very interesting approach! Definitely going to give this a
> spin next week.
>

Thanks Karel! I am working on finalising the process for creating pools on
Amazon.

Re: Provisioning as a Dedicated Service

Posted by Karel Vervaeke <ka...@ngdata.com>.
Sound like a very interesting approach! Definitely going to give this a
spin next week.
Karel


On Fri, Dec 14, 2012 at 4:34 PM, Andrei Savu <sa...@gmail.com> wrote:

> Hi guys,
>
> There is no secret that at Axemblr we are using Apache Whirr for
> provisioning and initial basic cluster configuration for Hadoop. As soon as
> the machines are running we configure Hadoop by leveraging APIs from
> existing tools like Cloudera Manager or Ambari.
>
> All the orchestration needed to make this happen is not trivial if you want
> the final system to be predictable, robust, restartable and easy to inspect
> while running.
>
> A few months ago we've realised that we need to re-work the machine
> provisioning layer from Whirr and build a system that has the following
> features:
>
> * should be able to provision 10s or 100s of virtual machines by doing a
> good job at handling API throttling and by using batch operations as much
> as possible
>
> * all the internal workflows should be persistent and as granular as
> possible and each step should be idempotent
>
> * it should be possible to restart the application server while starting
> virtual machines with no impact
>
> * it should have a modular architecture and provide enough flexibility to
> be able to work with a large number of public and private clouds just by
> replacing modules
>
> * it should hide all this complexity behind a simple REST API and a simple
> interactive shell
>
> * it should be able to automatically build gold base images and use the to
> spawn large clusters
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
> Cheers,
>
> -- Andrei Savu / axemblr.com
>

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Tue, Dec 18, 2012 at 9:44 PM, Adrian Cole <ad...@jclouds.org> wrote:

> Thanks for bringing this project out into the open.  Looks like a
> significant amount of effort, and worthwhile having a hard look.  Moreover,
> it is good to see more projects using or extending Whirr, either under or
> above (heh or maybe both in this case!).
>

Thanks Adrian!


> Something like this sounds like it can add persistence and recovering to
> provisioning workflows.  That's via activiti, right?  I imagine how
> provisionr does resiliency (as well ha/clustering of provisioning tasks)
> would make an exciting slideshare read.  Do keep us up-to-date.
>

Yep, all the persistence is handled by Activiti - there are some rough
edges but
it works as advertised.

For HA we are planning to have a multi-master database deployment with
multiple
job executors on different machines. All the synchronisation is done
through the DB.


>
> WRT whirr:
>
> I suppose whirr currently is possible to run embedded as a library, so
> direct dependency on provisionr isn't going to work provided we wish to
> continue this mode.  That said, could be an interesting experiment to look
> at another, more mesos-like, "whirr as a service": provisionr as a tlp or a
> "service" subproject within whirr. interesting discussion regardless.
>

We will find a way later on :)


>
> Congrats and thanks for opening up this project!
>
> -A
>
>
> jclouds-related footnotes for the curious:
>
> Not that this is the forum for it, but you might recall that jclouds 1.6
> alpha is blocking on throttling/request efficiency/resilience
> improvements.  Seems by your description that you've simultaneously been
> attacking this, among your other features.
>

Yep. I am looking forward to test / help with resilience and request
efficiency for jclouds 1.6.


>
> This has been slow on jclouds for a number of reasons, particularly looking
> for the way to do this without creating a strict service dependency, and
> without more spaghetti, or a bunch of lib deps.
>
> The RAX next gen throttle-thing denial of serviced many of us, and a lot of
> development time, effectively bumping priority.  Steve helped raise an
> abstraction of http throttle error, which is now in the 1.5.x codeline.
> Next step is to employ it with a system that shares quality information,
> potentially out to an external service like hystrix if users choose.  RAX
> raised the throttle globally, due to collaboration with jclouds, but
> jclouds aren't dropping the ball, are progressing this, and will release a
> library-only solution as a part of 1.6.
>
> WRT resumable workflow, I've looked at several systems that claim the
> ability to perform lightweight workflow or FSM.  There's a ton of tech
> available for use, but very few have a embedded mode that doesn't do
> something like start zookeeper, or have ESB or BPM ambitions which tend to
> bloat the dependency tree.  FWIW, my personal opinion is pipeline has the
> cleanest syntax, though it suffers from no OSS version as yet.  2 weeks
> ago, I started a conversation with google folks about this, but no news.
> Regardless, I'll keep folks posted as jclouds has for a long time aimed to
> have a resumable workflow aptitude without compromising light deps.
>

That would be great!


>
> http://code.google.com/p/appengine-pipeline/
>
> On Fri, Dec 14, 2012 at 7:34 AM, Andrei Savu <sa...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > There is no secret that at Axemblr we are using Apache Whirr for
> > provisioning and initial basic cluster configuration for Hadoop. As soon
> as
> > the machines are running we configure Hadoop by leveraging APIs from
> > existing tools like Cloudera Manager or Ambari.
> >
> > All the orchestration needed to make this happen is not trivial if you
> want
> > the final system to be predictable, robust, restartable and easy to
> inspect
> > while running.
> >
> > A few months ago we've realised that we need to re-work the machine
> > provisioning layer from Whirr and build a system that has the following
> > features:
> >
> > * should be able to provision 10s or 100s of virtual machines by doing a
> > good job at handling API throttling and by using batch operations as much
> > as possible
> >
> > * all the internal workflows should be persistent and as granular as
> > possible and each step should be idempotent
> >
> > * it should be possible to restart the application server while starting
> > virtual machines with no impact
> >
> > * it should have a modular architecture and provide enough flexibility to
> > be able to work with a large number of public and private clouds just by
> > replacing modules
> >
> > * it should hide all this complexity behind a simple REST API and a
> simple
> > interactive shell
> >
> > * it should be able to automatically build gold base images and use the
> to
> > spawn large clusters
> >
> > We've spent some time looking for existing products that do all this and
> in
> > the end we've decided that it's better to start from scratch and build
> this
> > system as a new project based on Activiti, Apache Karaf, jclouds and
> native
> > sdks.
> >
> > The source code is now publicly available at:
> >
> > https://github.com/axemblr/axemblr-provisionr
> >
> > I would really like to know what you think about the work we've done so
> > far. The project will improve a lot over the next couple of weeks /
> months
> > so I encourage you to stay tunned.
> >
> > We want to bring this project to the Apache Foundation later on. I will
> > give a talk in february at ApacheCon NA on this.
> >
> > Cheers,
> >
> > -- Andrei Savu / axemblr.com
> >
>

Re: Provisioning as a Dedicated Service

Posted by Adrian Cole <ad...@jclouds.org>.
Thanks for bringing this project out into the open.  Looks like a
significant amount of effort, and worthwhile having a hard look.  Moreover,
it is good to see more projects using or extending Whirr, either under or
above (heh or maybe both in this case!).

Something like this sounds like it can add persistence and recovering to
provisioning workflows.  That's via activiti, right?  I imagine how
provisionr does resiliency (as well ha/clustering of provisioning tasks)
would make an exciting slideshare read.  Do keep us up-to-date.

WRT whirr:

I suppose whirr currently is possible to run embedded as a library, so
direct dependency on provisionr isn't going to work provided we wish to
continue this mode.  That said, could be an interesting experiment to look
at another, more mesos-like, "whirr as a service": provisionr as a tlp or a
"service" subproject within whirr. interesting discussion regardless.

Congrats and thanks for opening up this project!

-A


jclouds-related footnotes for the curious:

Not that this is the forum for it, but you might recall that jclouds 1.6
alpha is blocking on throttling/request efficiency/resilience
improvements.  Seems by your description that you've simultaneously been
attacking this, among your other features.

This has been slow on jclouds for a number of reasons, particularly looking
for the way to do this without creating a strict service dependency, and
without more spaghetti, or a bunch of lib deps.

The RAX next gen throttle-thing denial of serviced many of us, and a lot of
development time, effectively bumping priority.  Steve helped raise an
abstraction of http throttle error, which is now in the 1.5.x codeline.
Next step is to employ it with a system that shares quality information,
potentially out to an external service like hystrix if users choose.  RAX
raised the throttle globally, due to collaboration with jclouds, but
jclouds aren't dropping the ball, are progressing this, and will release a
library-only solution as a part of 1.6.

WRT resumable workflow, I've looked at several systems that claim the
ability to perform lightweight workflow or FSM.  There's a ton of tech
available for use, but very few have a embedded mode that doesn't do
something like start zookeeper, or have ESB or BPM ambitions which tend to
bloat the dependency tree.  FWIW, my personal opinion is pipeline has the
cleanest syntax, though it suffers from no OSS version as yet.  2 weeks
ago, I started a conversation with google folks about this, but no news.
Regardless, I'll keep folks posted as jclouds has for a long time aimed to
have a resumable workflow aptitude without compromising light deps.

http://code.google.com/p/appengine-pipeline/

On Fri, Dec 14, 2012 at 7:34 AM, Andrei Savu <sa...@gmail.com> wrote:

> Hi guys,
>
> There is no secret that at Axemblr we are using Apache Whirr for
> provisioning and initial basic cluster configuration for Hadoop. As soon as
> the machines are running we configure Hadoop by leveraging APIs from
> existing tools like Cloudera Manager or Ambari.
>
> All the orchestration needed to make this happen is not trivial if you want
> the final system to be predictable, robust, restartable and easy to inspect
> while running.
>
> A few months ago we've realised that we need to re-work the machine
> provisioning layer from Whirr and build a system that has the following
> features:
>
> * should be able to provision 10s or 100s of virtual machines by doing a
> good job at handling API throttling and by using batch operations as much
> as possible
>
> * all the internal workflows should be persistent and as granular as
> possible and each step should be idempotent
>
> * it should be possible to restart the application server while starting
> virtual machines with no impact
>
> * it should have a modular architecture and provide enough flexibility to
> be able to work with a large number of public and private clouds just by
> replacing modules
>
> * it should hide all this complexity behind a simple REST API and a simple
> interactive shell
>
> * it should be able to automatically build gold base images and use the to
> spawn large clusters
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
> Cheers,
>
> -- Andrei Savu / axemblr.com
>

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Sat, Dec 15, 2012 at 8:59 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>wrote:

> It looks really cool (+1 non binding ;)). FYI, if it can help you, in
> ServiceMix, we already ship Activiti with Karaf.


Thanks JB! It's good to know that Activiti is part of ServiceMix - we faced
some difficulties while working on making this setup behave as expected. I
will have a look.

Re: Provisioning as a Dedicated Service

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
It looks really cool (+1 non binding ;)). FYI, if it can help you, in 
ServiceMix, we already ship Activiti with Karaf.

Regards
JB

On 12/14/2012 04:34 PM, Andrei Savu wrote:
> Hi guys,
>
> There is no secret that at Axemblr we are using Apache Whirr for
> provisioning and initial basic cluster configuration for Hadoop. As soon as
> the machines are running we configure Hadoop by leveraging APIs from
> existing tools like Cloudera Manager or Ambari.
>
> All the orchestration needed to make this happen is not trivial if you want
> the final system to be predictable, robust, restartable and easy to inspect
> while running.
>
> A few months ago we've realised that we need to re-work the machine
> provisioning layer from Whirr and build a system that has the following
> features:
>
> * should be able to provision 10s or 100s of virtual machines by doing a
> good job at handling API throttling and by using batch operations as much
> as possible
>
> * all the internal workflows should be persistent and as granular as
> possible and each step should be idempotent
>
> * it should be possible to restart the application server while starting
> virtual machines with no impact
>
> * it should have a modular architecture and provide enough flexibility to
> be able to work with a large number of public and private clouds just by
> replacing modules
>
> * it should hide all this complexity behind a simple REST API and a simple
> interactive shell
>
> * it should be able to automatically build gold base images and use the to
> spawn large clusters
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
> Cheers,
>
> -- Andrei Savu / axemblr.com
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
Thanks Ashish!

I am looking forward to review / commit your patches.

-- Andrei Savu

On Fri, Mar 1, 2013 at 12:17 AM, Ashish <pa...@gmail.com> wrote:

> Andrei,
>
> My involvement would be more from interest. My current day job doen't
> involve use of any of these tools :(
>
> So, it would be mostly the weekend hacking, blogging and probably
> using/testing stuff and play around with it.
>
> As of now I doubt I will be able to be the core dev for this. Contributor
> or not, I shall anyways do it :)
>
> HTH!
> ashish
>
>
> On Fri, Mar 1, 2013 at 11:51 AM, Andrei Savu <sa...@gmail.com>
> wrote:
>
> > Thanks Ashish!
> >
> > How would you use or extend Provisionr?
> >
> > -- Andrei Savu
> >
> > On Thu, Feb 28, 2013 at 5:14 PM, Ashish <pa...@gmail.com> wrote:
> >
> > > Andrei,
> > >
> > > Can you add me as contributor, if it works for you :)
> > >
> > >
> > > On Fri, Mar 1, 2013 at 12:32 AM, Andrei Savu <sa...@gmail.com>
> > > wrote:
> > >
> > > > Hi guys -
> > > >
> > > > I have submitted a proposal to bring Axemblr Provisionr to the Apache
> > > > Incubator (see general@incubator.apache.org):
> > > >
> > > > http://wiki.apache.org/incubator/ProvisionrProposal
> > > >
> > > > And this is a slide deck that explains medium term plans &
> challenges:
> > > >
> > > >
> > > >
> > >
> >
> http://www.slideshare.net/savu.andrei/creating-pools-of-virtual-machines-apachecon-na-2013
> > > >
> > > > If you want to join as a mentor / initial contributor you are
> welcome!
> > > >
> > > > Thanks,
> > > >
> > > > -- Andrei Savu
> > > >
> > > > On Sat, Feb 23, 2013 at 4:24 PM, Paul Baclace <
> paul.baclace@gmail.com
> > > > >wrote:
> > > >
> > > > > On 20130209 4:37 , Andrei Savu wrote:
> > > > >
> > > > >> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <
> > paul.baclace@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>  Do you have any rough idea of state transition latency and
> > throughput
> > > > you
> > > > >>> get when using Activiti and how this compares to using
> > Whirr/jclouds
> > > > in a
> > > > >>> single process?
> > > > >>>
> > > > >>>  Is this important? During pool creation most of the time is
> spent
> > in
> > > > >> loops
> > > > >> waiting for external services. We try to keep each activity as
> short
> > > as
> > > > >> possible to avoid long running transactions.
> > > > >>
> > > > >>  The reason I ask is that although Activiti has good support for
> > > > designing
> > > > >>> processes and programmatic control of the engine, it is
> necessarily
> > > DB
> > > > >>> transaction limited. An obvious alternative design is to use
> > > something
> > > > >>> that
> > > > >>> is actor based which can run entirely in RAM. I admit that an
> actor
> > > > >>> control
> > > > >>> system would make it harder to trace what happened, compared to
> > > > business
> > > > >>> process control which is very much oriented toward
> > human-in-the-loop.
> > > > >>>
> > > > >>>  I think it's going to take while for us to hit that limitation.
> I
> > > see
> > > > >> good
> > > > >> performance even if we are using an embedded H2 database - it
> should
> > > > work
> > > > >> a
> > > > >> lot better with a PostgresSQL server. It's true that Activiti is
> > > > oriented
> > > > >> towards human-in-the-loop processes but it works well also for
> > > > >> unsupervised
> > > > >> ones.
> > > > >>
> > > > >>
> > > > >>  As long as the orchestration is at the appropriate granularity
> (not
> > > > > micro-managing), then using Activiti should be fine. Another thing
> it
> > > can
> > > > > do that is more challenging for a single machine actor system is
> > > preserve
> > > > > state across controller restarts.
> > > > >
> > > > > Paul
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > thanks
> > > ashish
> > >
> > > Blog: http://www.ashishpaliwal.com/blog
> > > My Photo Galleries: http://www.pbase.com/ashishpaliwal
> > >
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Re: Provisioning as a Dedicated Service

Posted by Ashish <pa...@gmail.com>.
Andrei,

My involvement would be more from interest. My current day job doen't
involve use of any of these tools :(

So, it would be mostly the weekend hacking, blogging and probably
using/testing stuff and play around with it.

As of now I doubt I will be able to be the core dev for this. Contributor
or not, I shall anyways do it :)

HTH!
ashish


On Fri, Mar 1, 2013 at 11:51 AM, Andrei Savu <sa...@gmail.com> wrote:

> Thanks Ashish!
>
> How would you use or extend Provisionr?
>
> -- Andrei Savu
>
> On Thu, Feb 28, 2013 at 5:14 PM, Ashish <pa...@gmail.com> wrote:
>
> > Andrei,
> >
> > Can you add me as contributor, if it works for you :)
> >
> >
> > On Fri, Mar 1, 2013 at 12:32 AM, Andrei Savu <sa...@gmail.com>
> > wrote:
> >
> > > Hi guys -
> > >
> > > I have submitted a proposal to bring Axemblr Provisionr to the Apache
> > > Incubator (see general@incubator.apache.org):
> > >
> > > http://wiki.apache.org/incubator/ProvisionrProposal
> > >
> > > And this is a slide deck that explains medium term plans & challenges:
> > >
> > >
> > >
> >
> http://www.slideshare.net/savu.andrei/creating-pools-of-virtual-machines-apachecon-na-2013
> > >
> > > If you want to join as a mentor / initial contributor you are welcome!
> > >
> > > Thanks,
> > >
> > > -- Andrei Savu
> > >
> > > On Sat, Feb 23, 2013 at 4:24 PM, Paul Baclace <paul.baclace@gmail.com
> > > >wrote:
> > >
> > > > On 20130209 4:37 , Andrei Savu wrote:
> > > >
> > > >> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <
> paul.baclace@gmail.com>
> > > >> wrote:
> > > >>
> > > >>  Do you have any rough idea of state transition latency and
> throughput
> > > you
> > > >>> get when using Activiti and how this compares to using
> Whirr/jclouds
> > > in a
> > > >>> single process?
> > > >>>
> > > >>>  Is this important? During pool creation most of the time is spent
> in
> > > >> loops
> > > >> waiting for external services. We try to keep each activity as short
> > as
> > > >> possible to avoid long running transactions.
> > > >>
> > > >>  The reason I ask is that although Activiti has good support for
> > > designing
> > > >>> processes and programmatic control of the engine, it is necessarily
> > DB
> > > >>> transaction limited. An obvious alternative design is to use
> > something
> > > >>> that
> > > >>> is actor based which can run entirely in RAM. I admit that an actor
> > > >>> control
> > > >>> system would make it harder to trace what happened, compared to
> > > business
> > > >>> process control which is very much oriented toward
> human-in-the-loop.
> > > >>>
> > > >>>  I think it's going to take while for us to hit that limitation. I
> > see
> > > >> good
> > > >> performance even if we are using an embedded H2 database - it should
> > > work
> > > >> a
> > > >> lot better with a PostgresSQL server. It's true that Activiti is
> > > oriented
> > > >> towards human-in-the-loop processes but it works well also for
> > > >> unsupervised
> > > >> ones.
> > > >>
> > > >>
> > > >>  As long as the orchestration is at the appropriate granularity (not
> > > > micro-managing), then using Activiti should be fine. Another thing it
> > can
> > > > do that is more challenging for a single machine actor system is
> > preserve
> > > > state across controller restarts.
> > > >
> > > > Paul
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > thanks
> > ashish
> >
> > Blog: http://www.ashishpaliwal.com/blog
> > My Photo Galleries: http://www.pbase.com/ashishpaliwal
> >
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
Thanks Ashish!

How would you use or extend Provisionr?

-- Andrei Savu

On Thu, Feb 28, 2013 at 5:14 PM, Ashish <pa...@gmail.com> wrote:

> Andrei,
>
> Can you add me as contributor, if it works for you :)
>
>
> On Fri, Mar 1, 2013 at 12:32 AM, Andrei Savu <sa...@gmail.com>
> wrote:
>
> > Hi guys -
> >
> > I have submitted a proposal to bring Axemblr Provisionr to the Apache
> > Incubator (see general@incubator.apache.org):
> >
> > http://wiki.apache.org/incubator/ProvisionrProposal
> >
> > And this is a slide deck that explains medium term plans & challenges:
> >
> >
> >
> http://www.slideshare.net/savu.andrei/creating-pools-of-virtual-machines-apachecon-na-2013
> >
> > If you want to join as a mentor / initial contributor you are welcome!
> >
> > Thanks,
> >
> > -- Andrei Savu
> >
> > On Sat, Feb 23, 2013 at 4:24 PM, Paul Baclace <paul.baclace@gmail.com
> > >wrote:
> >
> > > On 20130209 4:37 , Andrei Savu wrote:
> > >
> > >> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <pa...@gmail.com>
> > >> wrote:
> > >>
> > >>  Do you have any rough idea of state transition latency and throughput
> > you
> > >>> get when using Activiti and how this compares to using Whirr/jclouds
> > in a
> > >>> single process?
> > >>>
> > >>>  Is this important? During pool creation most of the time is spent in
> > >> loops
> > >> waiting for external services. We try to keep each activity as short
> as
> > >> possible to avoid long running transactions.
> > >>
> > >>  The reason I ask is that although Activiti has good support for
> > designing
> > >>> processes and programmatic control of the engine, it is necessarily
> DB
> > >>> transaction limited. An obvious alternative design is to use
> something
> > >>> that
> > >>> is actor based which can run entirely in RAM. I admit that an actor
> > >>> control
> > >>> system would make it harder to trace what happened, compared to
> > business
> > >>> process control which is very much oriented toward human-in-the-loop.
> > >>>
> > >>>  I think it's going to take while for us to hit that limitation. I
> see
> > >> good
> > >> performance even if we are using an embedded H2 database - it should
> > work
> > >> a
> > >> lot better with a PostgresSQL server. It's true that Activiti is
> > oriented
> > >> towards human-in-the-loop processes but it works well also for
> > >> unsupervised
> > >> ones.
> > >>
> > >>
> > >>  As long as the orchestration is at the appropriate granularity (not
> > > micro-managing), then using Activiti should be fine. Another thing it
> can
> > > do that is more challenging for a single machine actor system is
> preserve
> > > state across controller restarts.
> > >
> > > Paul
> > >
> > >
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Re: Provisioning as a Dedicated Service

Posted by Ashish <pa...@gmail.com>.
Andrei,

Can you add me as contributor, if it works for you :)


On Fri, Mar 1, 2013 at 12:32 AM, Andrei Savu <sa...@gmail.com> wrote:

> Hi guys -
>
> I have submitted a proposal to bring Axemblr Provisionr to the Apache
> Incubator (see general@incubator.apache.org):
>
> http://wiki.apache.org/incubator/ProvisionrProposal
>
> And this is a slide deck that explains medium term plans & challenges:
>
>
> http://www.slideshare.net/savu.andrei/creating-pools-of-virtual-machines-apachecon-na-2013
>
> If you want to join as a mentor / initial contributor you are welcome!
>
> Thanks,
>
> -- Andrei Savu
>
> On Sat, Feb 23, 2013 at 4:24 PM, Paul Baclace <paul.baclace@gmail.com
> >wrote:
>
> > On 20130209 4:37 , Andrei Savu wrote:
> >
> >> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <pa...@gmail.com>
> >> wrote:
> >>
> >>  Do you have any rough idea of state transition latency and throughput
> you
> >>> get when using Activiti and how this compares to using Whirr/jclouds
> in a
> >>> single process?
> >>>
> >>>  Is this important? During pool creation most of the time is spent in
> >> loops
> >> waiting for external services. We try to keep each activity as short as
> >> possible to avoid long running transactions.
> >>
> >>  The reason I ask is that although Activiti has good support for
> designing
> >>> processes and programmatic control of the engine, it is necessarily DB
> >>> transaction limited. An obvious alternative design is to use something
> >>> that
> >>> is actor based which can run entirely in RAM. I admit that an actor
> >>> control
> >>> system would make it harder to trace what happened, compared to
> business
> >>> process control which is very much oriented toward human-in-the-loop.
> >>>
> >>>  I think it's going to take while for us to hit that limitation. I see
> >> good
> >> performance even if we are using an embedded H2 database - it should
> work
> >> a
> >> lot better with a PostgresSQL server. It's true that Activiti is
> oriented
> >> towards human-in-the-loop processes but it works well also for
> >> unsupervised
> >> ones.
> >>
> >>
> >>  As long as the orchestration is at the appropriate granularity (not
> > micro-managing), then using Activiti should be fine. Another thing it can
> > do that is more challenging for a single machine actor system is preserve
> > state across controller restarts.
> >
> > Paul
> >
> >
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
Hi guys -

I have submitted a proposal to bring Axemblr Provisionr to the Apache
Incubator (see general@incubator.apache.org):

http://wiki.apache.org/incubator/ProvisionrProposal

And this is a slide deck that explains medium term plans & challenges:

http://www.slideshare.net/savu.andrei/creating-pools-of-virtual-machines-apachecon-na-2013

If you want to join as a mentor / initial contributor you are welcome!

Thanks,

-- Andrei Savu

On Sat, Feb 23, 2013 at 4:24 PM, Paul Baclace <pa...@gmail.com>wrote:

> On 20130209 4:37 , Andrei Savu wrote:
>
>> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <pa...@gmail.com>
>> wrote:
>>
>>  Do you have any rough idea of state transition latency and throughput you
>>> get when using Activiti and how this compares to using Whirr/jclouds in a
>>> single process?
>>>
>>>  Is this important? During pool creation most of the time is spent in
>> loops
>> waiting for external services. We try to keep each activity as short as
>> possible to avoid long running transactions.
>>
>>  The reason I ask is that although Activiti has good support for designing
>>> processes and programmatic control of the engine, it is necessarily DB
>>> transaction limited. An obvious alternative design is to use something
>>> that
>>> is actor based which can run entirely in RAM. I admit that an actor
>>> control
>>> system would make it harder to trace what happened, compared to business
>>> process control which is very much oriented toward human-in-the-loop.
>>>
>>>  I think it's going to take while for us to hit that limitation. I see
>> good
>> performance even if we are using an embedded H2 database - it should work
>> a
>> lot better with a PostgresSQL server. It's true that Activiti is oriented
>> towards human-in-the-loop processes but it works well also for
>> unsupervised
>> ones.
>>
>>
>>  As long as the orchestration is at the appropriate granularity (not
> micro-managing), then using Activiti should be fine. Another thing it can
> do that is more challenging for a single machine actor system is preserve
> state across controller restarts.
>
> Paul
>
>

Re: Provisioning as a Dedicated Service

Posted by Paul Baclace <pa...@gmail.com>.
On 20130209 4:37 , Andrei Savu wrote:
> On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <pa...@gmail.com> wrote:
>
>> Do you have any rough idea of state transition latency and throughput you
>> get when using Activiti and how this compares to using Whirr/jclouds in a
>> single process?
>>
> Is this important? During pool creation most of the time is spent in loops
> waiting for external services. We try to keep each activity as short as
> possible to avoid long running transactions.
>
>> The reason I ask is that although Activiti has good support for designing
>> processes and programmatic control of the engine, it is necessarily DB
>> transaction limited. An obvious alternative design is to use something that
>> is actor based which can run entirely in RAM. I admit that an actor control
>> system would make it harder to trace what happened, compared to business
>> process control which is very much oriented toward human-in-the-loop.
>>
> I think it's going to take while for us to hit that limitation. I see good
> performance even if we are using an embedded H2 database - it should work a
> lot better with a PostgresSQL server. It's true that Activiti is oriented
> towards human-in-the-loop processes but it works well also for unsupervised
> ones.
>
>
As long as the orchestration is at the appropriate granularity (not 
micro-managing), then using Activiti should be fine. Another thing it 
can do that is more challenging for a single machine actor system is 
preserve state across controller restarts.

Paul


Re: Provisioning as a Dedicated Service

Posted by Andrei Savu <sa...@gmail.com>.
On Sat, Feb 9, 2013 at 4:44 AM, Paul Baclace <pa...@gmail.com> wrote:

> I keep watching the update stream of github.com/axemblr/axemblr-**
> provisionr <http://github.com/axemblr/axemblr-provisionr> and still see
> plenty of activity.
>

We are preparing a new release with support for spot instances, an easy way
to add pre-configure pool templates [1] and integration with Rundeck (
http://rundeck.org/)

[1]
https://github.com/axemblr/axemblr-provisionr/blob/master/core/src/main/resources/com/axemblr/provisionr/core/templates/cdh3.xml


> Do you have any rough idea of state transition latency and throughput you
> get when using Activiti and how this compares to using Whirr/jclouds in a
> single process?
>

Is this important? During pool creation most of the time is spent in loops
waiting for external services. We try to keep each activity as short as
possible to avoid long running transactions.


> The reason I ask is that although Activiti has good support for designing
> processes and programmatic control of the engine, it is necessarily DB
> transaction limited. An obvious alternative design is to use something that
> is actor based which can run entirely in RAM. I admit that an actor control
> system would make it harder to trace what happened, compared to business
> process control which is very much oriented toward human-in-the-loop.
>

I think it's going to take while for us to hit that limitation. I see good
performance even if we are using an embedded H2 database - it should work a
lot better with a PostgresSQL server. It's true that Activiti is oriented
towards human-in-the-loop processes but it works well also for unsupervised
ones.

-- Andrei Savu / axemblr.com

Re: Provisioning as a Dedicated Service

Posted by Paul Baclace <pa...@gmail.com>.
Andrei,

I keep watching the update stream of 
github.com/axemblr/axemblr-provisionr and still see plenty of activity.

Do you have any rough idea of state transition latency and throughput 
you get when using Activiti and how this compares to using Whirr/jclouds 
in a single process?

The reason I ask is that although Activiti has good support for 
designing processes and programmatic control of the engine, it is 
necessarily DB transaction limited. An obvious alternative design is to 
use something that is actor based which can run entirely in RAM. I admit 
that an actor control system would make it harder to trace what 
happened, compared to business process control which is very much 
oriented toward human-in-the-loop.

Regards,

Paul

For example, the

On 20121214 7:34 , Andrei Savu wrote:
> Hi guys,
>
> There is no secret that at Axemblr we are using Apache Whirr for
> provisioning and initial basic cluster configuration for Hadoop. As soon as
> the machines are running we configure Hadoop by leveraging APIs from
> existing tools like Cloudera Manager or Ambari.
>
> All the orchestration needed to make this happen is not trivial if you want
> the final system to be predictable, robust, restartable and easy to inspect
> while running.
>
> A few months ago we've realised that we need to re-work the machine
> provisioning layer from Whirr and build a system that has the following
> features:
>
> * should be able to provision 10s or 100s of virtual machines by doing a
> good job at handling API throttling and by using batch operations as much
> as possible
>
> * all the internal workflows should be persistent and as granular as
> possible and each step should be idempotent
>
> * it should be possible to restart the application server while starting
> virtual machines with no impact
>
> * it should have a modular architecture and provide enough flexibility to
> be able to work with a large number of public and private clouds just by
> replacing modules
>
> * it should hide all this complexity behind a simple REST API and a simple
> interactive shell
>
> * it should be able to automatically build gold base images and use the to
> spawn large clusters
>
> We've spent some time looking for existing products that do all this and in
> the end we've decided that it's better to start from scratch and build this
> system as a new project based on Activiti, Apache Karaf, jclouds and native
> sdks.
>
> The source code is now publicly available at:
>
> https://github.com/axemblr/axemblr-provisionr
>
> I would really like to know what you think about the work we've done so
> far. The project will improve a lot over the next couple of weeks / months
> so I encourage you to stay tunned.
>
> We want to bring this project to the Apache Foundation later on. I will
> give a talk in february at ApacheCon NA on this.
>
> Cheers,
>
> -- Andrei Savu / axemblr.com
>