You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sling.apache.org by Ian Boston <ie...@tfd.co.uk> on 2013/11/16 10:58:46 UTC

Is Sling DevOps and Cluster friendly?

Hi,
Reading the thread on Lockback logging which raises configuration
files on the filesystem and reading background information on
Marathon[1] Borge, Omega[2], Mesos[3] as well as a bit of experience
working on much older cluster management systems on MPP systems in the
1990's like Condor and IBM SP1/2s. Mix that with the smaller scale dev
ops frameworks like puppet, chef, fabric etc, and it triggered the
question in the subject. The references will also give context.

Sling thinks of itself as a self contained application running in an
OSGi container, storing everything in its own repository including all
configuration and the application. Its true that you can build a
launchpad which deploys initial jars on disk and configs under Felix,
however many deployments of Sling put everything in the repository (eg
CQ). To manage whats in the repository there are a number of tools in
Sling and in downstream applications to manage the state of the
application built on Sling. This allows Sling to be self contained
which is good, however it also means Sling has to write and own
everything to make it work in a real cluster environment, which is bad
because Sling cant remain an island and survive.

Meanwhile other at scale deployments use the systems mentioned in my
first paragraph to deploy and configure everything, reliably and
repeatably. The development chain is  built to work with these systems
right from the Git or SVN code repository through to OS configuration.
Without plugins and translation,  Sling can't use these systems. With
extra work it can, but there is an uneasy discontinuity.

This leaves me wondering the current direction of Sling, to contain
configuration and application is the right direction for the long term
health of the project, or if the direction raised by the LogBack would
be better. That direction, namely configure logging on the filesystem
with a file that can be managed, would enable those using Sling at
scale to build upon the work other others.

To make that possible the repository under Sling would have to be
reserved for data generated by the application or initial content,
whilst application code and configuration is kept on the filesystem,
so that a running instance could be upgraded entirely on the
filesystem by configuration management and versioned in a single
atomic operation. If that were possible Sling deployments could start
to use the systems mentioned in the first paragraph without
modification.

Best Regards
Ian

1 https://github.com/mesosphere/marathon
2 http://www.youtube.com/watch?v=0ZFMlO98Jkc
3 http://mesos.apache.org/

Re: Is Sling DevOps and Cluster friendly?

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Sat, Nov 16, 2013 at 10:58 AM, Ian Boston <ie...@tfd.co.uk> wrote:
> ...the repository under Sling would have to be
> reserved for data generated by the application or initial content,
> whilst application code and configuration is kept on the filesystem,
> so that a running instance could be upgraded entirely on the
> filesystem by configuration management and versioned in a single
> atomic operation...

FYI, in case anyone's interested we've been doing some experiments
along these lines with my intern Artyom. The experiments themselves
are at [1] and we'll be using the work-in-progress contrib/crankstart
module to continue experimenting.

This experimental crankstart launcher fully defines a Sling instance
in a text file like [2], which is useful in a devops/continuous
deployment context, especially if Sling instances can be considered
throwaway - in our experiments we never change the "configuration" of
a Sling instance (in a wide sense, "configuration" including bundles,
scripts, OSGi configs etc) but spin up and switch to new Sling
instances when the config changes. Careful coordination with the HTTP
front-end (mod_proxy or similar) allows for the switch from one
configuration to the next to be atomic, as seen from the outside.

This is all still very experimental...just wanted to share the rough
ideas in case people are interested in contributing or playing with
that stuff.

-Bertrand

[1] https://github.com/bdelacretaz/sling-devops-experiments (and
Artyom's fork might be more current,
https://github.com/ArtyomStetsenko/sling-devops-experiments)

[2] http://svn.apache.org/repos/asf/sling/trunk/contrib/crankstart/launcher/sling.crank.txt

Re: Is Sling DevOps and Cluster friendly?

Posted by Ian Boston <ie...@tfd.co.uk>.

On 20 November 2013 09:08, Bertrand Delacretaz <bd...@apache.org> wrote:
> Hi Ian,
>
> Thanks for launching this discussion!
>
> IMO the "everything in a runnable jar" approach is very useful for
> development, testing and "small" deployments, which can be quite solid
> already. So we shouldn't lose that.
>
> OTOH I'm also all for making Sling more devops and cluster friendly.
>
> As a basic model for that, would you agree that "all my configs and
> code are in a Git repository and all my Sling instances update
> themselves when I push to it" makes sense?

Yes, well put.

Ideally the glue between the two would be standard framework (eg
Puppet or one of the others), and you might classify "code" as
released jars or pointers to released jars, rather than java code that
needs compiling.

Many go one step further and make the jump from java code to
deployable jar via a CI build (Jenkins or Travis) on commit. Doing
this at the start of a project cycle so dev instances are deployed on
commit (and don't work initially) introduces a culture to a project
that works at scale in production. Doing it after the fact, it an
uphill struggle.


Best Regards
Ian

Just to try to capture the
> basic idea in a form that's simple to explain.
>
> On Sat, Nov 16, 2013 at 10:58 AM, Ian Boston <ie...@tfd.co.uk> wrote:
>> ...the repository under Sling would have to be
>> reserved for data generated by the application or initial content,
>> whilst application code and configuration is kept on the filesystem,
>> so that a running instance could be upgraded entirely on the
>> filesystem by configuration management and versioned in a single
>> atomic operation...
>
> How do you see this atomic operation in practice? Prepare new code and
> configs in a new folder on the filesystem, change a setting in Sling
> to point to that new folder and expect things to happen atomically?
>
> Due to the async way in which Sling handles config and code changes,
> making sure things happen atomically might require some work. But
> apart from that as Felix says I think we have everything in Sling to
> allow it to get more things from the filesystem when needed: bundles,
> configs, scripts, and arbitrary resources can all come from the
> filesystem.
>
> -Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Wed, Nov 27, 2013 at 10:55 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> ...I've been playing a bit with http://www.docker.io/ and I like what I
> see - initial experiments at [1] FWIW...

I sent a wrong link, sorry, the right one is
https://github.com/bdelacretaz/docker-experiments/blob/master/sling-runnable-jar/Dockerfile

-Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Wed, Nov 20, 2013 at 11:49 AM, Felix Meschberger <fm...@adobe.com> wrote:
> Am 20.11.2013 um 20:08 schrieb Bertrand Delacretaz <bd...@apache.org>:
>> ...Due to the async way in which Sling handles config and code changes,
>> making sure things happen atomically might require some work.
>
> Well, if we want to do that we really should investigate Deployment Admin because this has
> transactional promises: Either all or nothing....

Inside of straining too much inside Sling, we might also take a
systems approach to this and use throwaway Sling instances.

To upgrade to a more recent "config" (including rendering scripts etc)
start a new instance, and once it's ready have the front-end switch to
it gracefully, like apachectl -k restart does.

I've been playing a bit with http://www.docker.io/ and I like what I
see - initial experiments at [1] FWIW. Ironing out the "details" would
need work, of course ;-)

-Bertrand

[1] https://git.corp.adobe.com/bdelacre/docker-experiments/blob/master/sling-runnable-jar/Dockerfile

Re: Is Sling DevOps and Cluster friendly?

Posted by Felix Meschberger <fm...@adobe.com>.

Hi


Am 20.11.2013 um 21:18 schrieb Bertrand Delacretaz <bd...@apache.org>:

> On Wed, Nov 20, 2013 at 11:49 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> Am 20.11.2013 um 20:08 schrieb Bertrand Delacretaz <bd...@apache.org>:
>>> ...Due to the async way in which Sling handles config and code changes,
>>> making sure things happen atomically might require some work.
>> 
>> Well, if we want to do that we really should investigate Deployment Admin because
>> this has transactional promises: Either all or nothing....
> 
> Not sure if that would cover Sling rendering scripts, for example.

That would be (Jackrabbit) content packages ? The Deployment Admin spec provides for extensibility in the types of artifacts (resource processors) that can be deployed.

Regards
Felix

> 
> You could also take a more radical approach, create a completely new
> Sling instance when a new config (in the broad sense: OSGi configs,
> rendering scripts etc) appears, and atomically switch to the new
> instance when ready. Considering Sling instances as isolated
> disposable units would also help auto-scaling scenarios, where you
> start and stop instances as needed.
> 
> -Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Wed, Nov 20, 2013 at 11:49 AM, Felix Meschberger <fm...@adobe.com> wrote:
> Am 20.11.2013 um 20:08 schrieb Bertrand Delacretaz <bd...@apache.org>:
>> ...Due to the async way in which Sling handles config and code changes,
>> making sure things happen atomically might require some work.
>
> Well, if we want to do that we really should investigate Deployment Admin because
> this has transactional promises: Either all or nothing....

Not sure if that would cover Sling rendering scripts, for example.

You could also take a more radical approach, create a completely new
Sling instance when a new config (in the broad sense: OSGi configs,
rendering scripts etc) appears, and atomically switch to the new
instance when ready. Considering Sling instances as isolated
disposable units would also help auto-scaling scenarios, where you
start and stop instances as needed.

-Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Felix Meschberger <fm...@adobe.com>.

Hi



Am 20.11.2013 um 20:08 schrieb Bertrand Delacretaz <bd...@apache.org>:

> Hi Ian,
> 
> Thanks for launching this discussion!
> 
> IMO the "everything in a runnable jar" approach is very useful for
> development, testing and "small" deployments, which can be quite solid
> already. So we shouldn't lose that.
> 
> OTOH I'm also all for making Sling more devops and cluster friendly.
> 
> As a basic model for that, would you agree that "all my configs and
> code are in a Git repository and all my Sling instances update
> themselves when I push to it" makes sense? Just to try to capture the
> basic idea in a form that's simple to explain.
> 
> On Sat, Nov 16, 2013 at 10:58 AM, Ian Boston <ie...@tfd.co.uk> wrote:
>> ...the repository under Sling would have to be
>> reserved for data generated by the application or initial content,
>> whilst application code and configuration is kept on the filesystem,
>> so that a running instance could be upgraded entirely on the
>> filesystem by configuration management and versioned in a single
>> atomic operation...
> 
> How do you see this atomic operation in practice? Prepare new code and
> configs in a new folder on the filesystem, change a setting in Sling
> to point to that new folder and expect things to happen atomically?
> 
> Due to the async way in which Sling handles config and code changes,
> making sure things happen atomically might require some work.

Well, if we want to do that we really should investigate Deployment Admin because this has transactional promises: Either all or nothing.

Another thing that comes to mind is in the context of a asynchronicity: I think we can at least introduce a feedback channel in the form of events sent from OSGi installer. I am not sure, though, whether we not already would have such things.

Regards
Felix

> But
> apart from that as Felix says I think we have everything in Sling to
> allow it to get more things from the filesystem when needed: bundles,
> configs, scripts, and arbitrary resources can all come from the
> filesystem.
> 
> -Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Ian,

Thanks for launching this discussion!

IMO the "everything in a runnable jar" approach is very useful for
development, testing and "small" deployments, which can be quite solid
already. So we shouldn't lose that.

OTOH I'm also all for making Sling more devops and cluster friendly.

As a basic model for that, would you agree that "all my configs and
code are in a Git repository and all my Sling instances update
themselves when I push to it" makes sense? Just to try to capture the
basic idea in a form that's simple to explain.

On Sat, Nov 16, 2013 at 10:58 AM, Ian Boston <ie...@tfd.co.uk> wrote:
> ...the repository under Sling would have to be
> reserved for data generated by the application or initial content,
> whilst application code and configuration is kept on the filesystem,
> so that a running instance could be upgraded entirely on the
> filesystem by configuration management and versioned in a single
> atomic operation...

How do you see this atomic operation in practice? Prepare new code and
configs in a new folder on the filesystem, change a setting in Sling
to point to that new folder and expect things to happen atomically?

Due to the async way in which Sling handles config and code changes,
making sure things happen atomically might require some work. But
apart from that as Felix says I think we have everything in Sling to
allow it to get more things from the filesystem when needed: bundles,
configs, scripts, and arbitrary resources can all come from the
filesystem.

-Bertrand

Re: Is Sling DevOps and Cluster friendly?

Posted by Chetan Mehrotra <ch...@gmail.com>.

Thanks Ian for those links. Makes for interesting read over the weekend!!

So far I do not have much experience on the Ops side of things.
However for my current development work I often have to start Sling
based systems from scratch multiple times in a week. And I prefer to
have a particular setup of Sling which is optimized for development.
For example my Sling system is configured to

1. Turn on and off debug level logging for certain categories.
Basically a complete Logback config tailored to help troubleshooting
as fast as possible.

2. The logging also include some MDC data i.e. user session, testCase
name for integration test etc

3. Some of the Sling component configurations are changed from default
   value. For example
   a) The Sling Main Servlet should store 100 recent request instead of
       default 20
   b) The Event Web Console Plugin should keep a record of 1000 past events
   c) Discovery service frequency set to 1 hour instead of 15 sec

4. Some extra bundles deployed to help in troubleshooting like Felix
Script Console, Slf4j MDC Support etc

5. Patch the bundles packaged with Quickstart. For example while working with
   Oak I often need to override the bundles present in Sling and start again
   from scratch. In that case I try to avoid rebuilding complete quickstart

To automate such stuff I use the Sling install folder support where by
one can put bundles and configurations at <Sling Home>/install folder
and Sling installer would give preference to that while installing
similar resources.

So I have a custom script which would pre-create such a folder
structure and would populate it with my custom configuration files
(which would supercede the packaged configuration) before starting the
quickstart. This arrangement has helped me in so far in customizing
out of the box Sling based application as per my requirements.

Hope that helps

Chetan Mehrotra

On Sat, Nov 16, 2013 at 3:28 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> Hi,
> Reading the thread on Lockback logging which raises configuration
> files on the filesystem and reading background information on
> Marathon[1] Borge, Omega[2], Mesos[3] as well as a bit of experience
> working on much older cluster management systems on MPP systems in the
> 1990's like Condor and IBM SP1/2s. Mix that with the smaller scale dev
> ops frameworks like puppet, chef, fabric etc, and it triggered the
> question in the subject. The references will also give context.
>
> Sling thinks of itself as a self contained application running in an
> OSGi container, storing everything in its own repository including all
> configuration and the application. Its true that you can build a
> launchpad which deploys initial jars on disk and configs under Felix,
> however many deployments of Sling put everything in the repository (eg
> CQ). To manage whats in the repository there are a number of tools in
> Sling and in downstream applications to manage the state of the
> application built on Sling. This allows Sling to be self contained
> which is good, however it also means Sling has to write and own
> everything to make it work in a real cluster environment, which is bad
> because Sling cant remain an island and survive.
>
> Meanwhile other at scale deployments use the systems mentioned in my
> first paragraph to deploy and configure everything, reliably and
> repeatably. The development chain is  built to work with these systems
> right from the Git or SVN code repository through to OS configuration.
> Without plugins and translation,  Sling can't use these systems. With
> extra work it can, but there is an uneasy discontinuity.
>
> This leaves me wondering the current direction of Sling, to contain
> configuration and application is the right direction for the long term
> health of the project, or if the direction raised by the LogBack would
> be better. That direction, namely configure logging on the filesystem
> with a file that can be managed, would enable those using Sling at
> scale to build upon the work other others.
>
> To make that possible the repository under Sling would have to be
> reserved for data generated by the application or initial content,
> whilst application code and configuration is kept on the filesystem,
> so that a running instance could be upgraded entirely on the
> filesystem by configuration management and versioned in a single
> atomic operation. If that were possible Sling deployments could start
> to use the systems mentioned in the first paragraph without
> modification.
>
> Best Regards
> Ian
>
> 1 https://github.com/mesosphere/marathon
> 2 http://www.youtube.com/watch?v=0ZFMlO98Jkc
> 3 http://mesos.apache.org/

Re: Is Sling DevOps and Cluster friendly?

Posted by Felix Meschberger <fm...@adobe.com>.

Hi Ian

Thanks ineed for bringing this up.

Interestingly enough, we have quite some existing functionality in Sling, which does not actually require a repository (even though the use of the repository may be looked at as being the 1st class citizen, it is actually not the only one):

* On startup, as Chetan mentioned, the thing we call „Launchpad Installer“ picks up bundles and configurations from the „install“ folder for deployment.
* If you want to deploy bundles and configuration you can setup the OSGi Installer to actually listen for filesystem changes much like it does for repository changes (for details: This would be the File Provider of the OSGi Installer; and it is the reason why the original JCR Installer is now called the OSGi Installer)
* If you want to access filesystem resources, there is a Filesystem ResourceProvider you can configure and then have the filesystem resource exposed in Sling

So, I would say, the mechanics are there to be used. No need to require XML file configuration support (yet, having it is nice, though).

Regards
Felix

NB: I talk about bundles and configurations above because these are currently the two things we can actually deploy in Sling. Maybe in the future (and yes, being repository centric) we might have another deployment thing being Jackrabbit content packages. They could and should (well in our commercial system CQ they actually are) be treated in the same way.


Am 16.11.2013 um 20:58 schrieb Ian Boston <ie...@tfd.co.uk>:

> Hi,
> Reading the thread on Lockback logging which raises configuration
> files on the filesystem and reading background information on
> Marathon[1] Borge, Omega[2], Mesos[3] as well as a bit of experience
> working on much older cluster management systems on MPP systems in the
> 1990's like Condor and IBM SP1/2s. Mix that with the smaller scale dev
> ops frameworks like puppet, chef, fabric etc, and it triggered the
> question in the subject. The references will also give context.
> 
> Sling thinks of itself as a self contained application running in an
> OSGi container, storing everything in its own repository including all
> configuration and the application. Its true that you can build a
> launchpad which deploys initial jars on disk and configs under Felix,
> however many deployments of Sling put everything in the repository (eg
> CQ). To manage whats in the repository there are a number of tools in
> Sling and in downstream applications to manage the state of the
> application built on Sling. This allows Sling to be self contained
> which is good, however it also means Sling has to write and own
> everything to make it work in a real cluster environment, which is bad
> because Sling cant remain an island and survive.
> 
> Meanwhile other at scale deployments use the systems mentioned in my
> first paragraph to deploy and configure everything, reliably and
> repeatably. The development chain is  built to work with these systems
> right from the Git or SVN code repository through to OS configuration.
> Without plugins and translation,  Sling can't use these systems. With
> extra work it can, but there is an uneasy discontinuity.
> 
> This leaves me wondering the current direction of Sling, to contain
> configuration and application is the right direction for the long term
> health of the project, or if the direction raised by the LogBack would
> be better. That direction, namely configure logging on the filesystem
> with a file that can be managed, would enable those using Sling at
> scale to build upon the work other others.
> 
> To make that possible the repository under Sling would have to be
> reserved for data generated by the application or initial content,
> whilst application code and configuration is kept on the filesystem,
> so that a running instance could be upgraded entirely on the
> filesystem by configuration management and versioned in a single
> atomic operation. If that were possible Sling deployments could start
> to use the systems mentioned in the first paragraph without
> modification.
> 
> Best Regards
> Ian
> 
> 1 https://github.com/mesosphere/marathon
> 2 http://www.youtube.com/watch?v=0ZFMlO98Jkc
> 3 http://mesos.apache.org/