You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@sling.apache.org by Roy Teeuwen <ro...@teeuwen.be> on 2017/10/22 17:39:00 UTC

Daily content sync between environments

Hey guys,

I didn't know for sure where to put this question so thats why I am asking it in both Jackrabbit as in Sling, because both might have their own solution / proposal, and I have seen solutions that are pure Jackrabbit but some that also require Sling.

I was wondering about what some of you might achieve something like a daily content sync between different environments (Production to Acceptance/Development for example).

I have been investigating it for a while now, and the current solution we use is by use of the oak-migration tool, where every release that we have, we bring the repo down, do an export of all the paths that we need (mainly /content and some stuff in /etc and /conf), and do an import on all the other lower environments. But I am not sure if this is the ideal solution, definitely if we want to go to a daily content sync instead of a release bound content sync, seeing as this required the instance be put offline before it can run (unless this changed in a future version, please correct me if it did because that would be very nice).

The tools I have found that might be of use until are the following:
One could create a JCR content package of production, and copy it over to the different environments daily. The downside of this is that when the package becomes big, it gets problematic (for example when a lot of images/videos start to get involved).
One could use VLT RCP, I have tried this but it seems that it has a bug[1] that when there is an error in the vlt rcp, it actually retries infinite amount to fix it, making the actual sync impossible from the moment one error for a specific node pops up. A second problem, and also sort of a dealbreaker is that for VLT RCP to work, you would have to make the production environment network accessible to all the other environments, or a bit better, use some proxy that has access to both production and the test environments.
Time Warner seems to have run into the same issue, and created Grabbit[2] for this, but their tool has also two mayor drawbacks. The first drawback is related to the VLT RCP, where you would need to have access from production to all the other environments, but even worse is that you can't do it with a proxy in between, meaning they really need to have access to each other. A second drawback that I really dislike is that their tool is written in Spring OSGi. I have no idea why one would make a tool in OSGi and then use Spring for it, but they bring in like 30 bundles just to do a content sync between environments, mainly outdated Spring 3.X bundles.
Lastly of course there is the oak migration tool, which is super fast in doing an export and import, it is the current one we use, but the drawback being that you have to bring down the environment.
The solutions that I am currently thinking of are one of the next two, which are almost similar, but it would be nice to hear some opinions from other people.

We could write a service that just specifies a list of node paths that have been altered in the current day, for example based on the jcr:lastModified date.
From this list, one could do one of the following this:
Create a dynamic jcr content package, based on the list provided by this service. The drawback being if there are a lot of new images/videos that specific day, the content package might get slow / be too big / time out
Create a dynamic list of include and merge paths for the oak-migration tool based on this service. The drawback being you would have to bring down your repo after this

Another solution when going for the second part would be to expand the oak-migration tool to optionally specify which paths it should export based on a specific date property being larger/smaller than the one you pass as parameter

Any opinions / suggestions on this are more than welcome.

Greets,
Roy Teeuwen

[1] http://jackrabbit.510166.n4.nabble.com/Filevault-OakConstraint0021-Exception-td4666959.html <http://jackrabbit.510166.n4.nabble.com/Filevault-OakConstraint0021-Exception-td4666959.html>
[2] https://github.com/TWCable/grabbit <https://github.com/TWCable/grabbit>

Re: Daily content sync between environments

Posted by Davide Giannella <da...@apache.org>.
On 22/10/2017 19:39, Roy Teeuwen wrote:
>
>   * One could create a JCR content package of production, and copy it
>     over to the different environments daily. The downside of this is
>     that when the package becomes big, it gets problematic (for
>     example when a lot of images/videos start to get involved). 
>   * One could use VLT RCP, I have tried this but it seems that it has
>     a bug[1] that when there is an error in the vlt rcp, it actually
>     retries infinite amount to fix it, making the actual sync
>     impossible from the moment one error for a specific node pops up.
>     A second problem, and also sort of a dealbreaker is that for VLT
>     RCP to work, you would have to make the production environment
>     network accessible to all the other environments, or a bit better,
>     use some proxy that has access to both production and the test
>     environments.
>   * Time Warner seems to have run into the same issue, and created
>     Grabbit[2] for this, but their tool has also two mayor drawbacks.
>     The first drawback is related to the VLT RCP, where you would need
>     to have access from production to all the other environments, but
>     even worse is that you can't do it with a proxy in between,
>     meaning they really need to have access to each other. A second
>     drawback that I really dislike is that their tool is written in
>     Spring OSGi. I have no idea why one would make a tool in OSGi and
>     then use Spring for it, but they bring in like 30 bundles just to
>     do a content sync between environments, mainly outdated Spring 3.X
>     bundles.
>   * Lastly of course there is the oak migration tool, which is super
>     fast in doing an export and import, it is the current one we use,
>     but the drawback being that you have to bring down the environment.
>

Hmmm, so off the top of my head I would say.

vlt rcp is a very good candidate; but as you said it requires access
rights between production and the other environments. I used it already
more than once to sycn content around and it works fairly nicely without
sucking too many resources. Don't know about the current status as I
used it many years ago last time but you could have some observation
that collects the paths that are changed, craft a list and then an agent
that pull such list for moving content around. Good bit is that it
doesn't require the instance to be shut down.

Content package are good candidates as well. Same approach as above
where you craft a list of paths you want to include and then you build a
package and download it from somewhere. Cons are for example that
content package build will probably tend to be more resource intensive
than vlt rcp.

Finally, another approach I used, but it was AEM, was to leverage the
sling replication to distribute content around as it was being made public.

The big question mark I have here is that I don't know how currently
you're making some content live.

Another aspect you may look at it, which is orthogonal to the above, is
the deployment topology. You may opt for a mongo clustered deployment,
where you have mongo keeping in sync with it's own replication mechanism
to a non-live cluster. Then you can rely on mongo backups to move
content around. You'll have a production copy to work with, by being an
additional node, it could be outside of live traffic not impacting
therefore the performance, you have a place where in case of disaster
could act as recovery.

HTH
Davide



Re: Daily content sync between environments

Posted by Daniel Klco <dk...@apache.org>.
I've used VLT for this as well. It's nice as you can target the path to
update but it is clunky. From a speed perspective, you could also use an S3
bucket to transfer the content if you are using AWS so each script only
cares about its local environment and you don't need to allow direct access
between servers.

On Mon, Oct 23, 2017 at 11:58 AM, Robert Munteanu <ro...@apache.org>
wrote:

> Hi Roy,
>
> On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
> > Hey guys,
> >
> > I didn't know for sure where to put this question so thats why I am
> > asking it in both Jackrabbit as in Sling, because both might have
> > their own solution / proposal, and I have seen solutions that are
> > pure Jackrabbit but some that also require Sling.
> >
> > I was wondering about what some of you might achieve something like a
> > daily content sync between different environments (Production to
> > Acceptance/Development for example).
>
> Never tried this, but maybe you can use vlt ci/co? Initially you run
> vlt co against the 'source' instance and then vlt ci against the
> 'target' instance. First time you will sync everything, but then
> operations will be incremental.
>
> You do have to find a clean way of switching the vlt url between
> invocations though.
>
> Robert
>

Re: Daily content sync between environments

Posted by Daniel Klco <dk...@apache.org>.
On a similar vein, another fun one we did a while back was to vlt co into a
git repo and then blanket add and commit any changes on the hour. That way
we had a snapshot of exactly what was going on in the repository and the
ability to snapshot back any portion of the tree at any time (at a
reasonable granularity level). Very useful in the throes of authoring where
we were training up authors and trying to get a bunch of site content added
at the same time.

On Mon, Oct 23, 2017 at 3:06 PM, Roy Teeuwen <ro...@teeuwen.be> wrote:

> Hey Ioan,
>
> In my initial post I already gave you all the experience I have so for, I
> have tested out 4 different tools (Content Packages, VLT RCP, Grabbit and
> oak-migration tool) and given my benefits and drawbacks.
>
> I will give VLT a try again, but I think seeing as the problem I found
> when first using it through VLT RCP[1], so not the vlt co that Robert and
> Daniel suggested, I am not sure if that one will work out.
> If VLT fails and no other solutions come up, I might probably go for the
> creating a daily content package based on a query which looks for new
> content created that day.
>
> Greets,
> Roy
>
> [1] http://jackrabbit.510166.n4.nabble.com/Filevault-
> OakConstraint0021-Exception-td4666959.html
>
>
> On 23 Oct 2017, at 18:53, Ioan Eugen Stan <ie...@netdava.com> wrote:
>
> Hi Roy,
>
> This is a use case that we have as well.
>
> We need to be able to sync content between different instances
> periodically or on request. We are still in the planning phase and will
> start to implement Sling once we have enough resources.
>
> Please share your experience so others can benefit (including us).
>
> In our use case we have an Item Bank with lots of items and other
> content. Most of the time we need to update translations. In some cases
> (once every few months) we need to add/update the content/content
> structure. Translations can be handled in lots of ways, we might not
> even use the vlt solution Robert proposed.  For new content I think that
> is something that will help us.
>
>
> Regards,
>
>
> On 23.10.2017 18:58, Robert Munteanu wrote:
>
> Hi Roy,
>
> On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
>
> Hey guys,
>
> I didn't know for sure where to put this question so thats why I am
> asking it in both Jackrabbit as in Sling, because both might have
> their own solution / proposal, and I have seen solutions that are
> pure Jackrabbit but some that also require Sling.
>
> I was wondering about what some of you might achieve something like a
> daily content sync between different environments (Production to
> Acceptance/Development for example).
>
> Never tried this, but maybe you can use vlt ci/co? Initially you run
> vlt co against the 'source' instance and then vlt ci against the
> 'target' instance. First time you will sync everything, but then
> operations will be incremental.
>
> You do have to find a clean way of switching the vlt url between
> invocations though.
>
> Robert
>
>
>
>
>

Re: Daily content sync between environments

Posted by Roy Teeuwen <ro...@teeuwen.be>.
Hey Ioan,

In my initial post I already gave you all the experience I have so for, I have tested out 4 different tools (Content Packages, VLT RCP, Grabbit and oak-migration tool) and given my benefits and drawbacks.

I will give VLT a try again, but I think seeing as the problem I found when first using it through VLT RCP[1], so not the vlt co that Robert and Daniel suggested, I am not sure if that one will work out.
If VLT fails and no other solutions come up, I might probably go for the creating a daily content package based on a query which looks for new content created that day.

Greets,
Roy

[1] http://jackrabbit.510166.n4.nabble.com/Filevault-OakConstraint0021-Exception-td4666959.html <http://jackrabbit.510166.n4.nabble.com/Filevault-OakConstraint0021-Exception-td4666959.html>


> On 23 Oct 2017, at 18:53, Ioan Eugen Stan <ie...@netdava.com> wrote:
> 
> Hi Roy,
> 
> This is a use case that we have as well.
> 
> We need to be able to sync content between different instances
> periodically or on request. We are still in the planning phase and will
> start to implement Sling once we have enough resources.
> 
> Please share your experience so others can benefit (including us).
> 
> In our use case we have an Item Bank with lots of items and other
> content. Most of the time we need to update translations. In some cases
> (once every few months) we need to add/update the content/content
> structure. Translations can be handled in lots of ways, we might not
> even use the vlt solution Robert proposed.  For new content I think that
> is something that will help us.
> 
> 
> Regards,
> 
> 
> On 23.10.2017 18:58, Robert Munteanu wrote:
>> Hi Roy,
>> 
>> On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
>>> Hey guys,
>>> 
>>> I didn't know for sure where to put this question so thats why I am
>>> asking it in both Jackrabbit as in Sling, because both might have
>>> their own solution / proposal, and I have seen solutions that are
>>> pure Jackrabbit but some that also require Sling.
>>> 
>>> I was wondering about what some of you might achieve something like a
>>> daily content sync between different environments (Production to
>>> Acceptance/Development for example).
>> Never tried this, but maybe you can use vlt ci/co? Initially you run
>> vlt co against the 'source' instance and then vlt ci against the
>> 'target' instance. First time you will sync everything, but then
>> operations will be incremental.
>> 
>> You do have to find a clean way of switching the vlt url between
>> invocations though.
>> 
>> Robert
> 
> 


Re: Daily content sync between environments

Posted by Ioan Eugen Stan <ie...@netdava.com>.
Hi Roy,

This is a use case that we have as well.

We need to be able to sync content between different instances
periodically or on request. We are still in the planning phase and will
start to implement Sling once we have enough resources.

Please share your experience so others can benefit (including us).

In our use case we have an Item Bank with lots of items and other
content. Most of the time we need to update translations. In some cases
(once every few months) we need to add/update the content/content
structure. Translations can be handled in lots of ways, we might not
even use the vlt solution Robert proposed.  For new content I think that
is something that will help us.


Regards,


On 23.10.2017 18:58, Robert Munteanu wrote:
> Hi Roy,
>
> On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
>> Hey guys,
>>
>> I didn't know for sure where to put this question so thats why I am
>> asking it in both Jackrabbit as in Sling, because both might have
>> their own solution / proposal, and I have seen solutions that are
>> pure Jackrabbit but some that also require Sling.
>>
>> I was wondering about what some of you might achieve something like a
>> daily content sync between different environments (Production to
>> Acceptance/Development for example).
> Never tried this, but maybe you can use vlt ci/co? Initially you run
> vlt co against the 'source' instance and then vlt ci against the
> 'target' instance. First time you will sync everything, but then
> operations will be incremental.
>
> You do have to find a clean way of switching the vlt url between
> invocations though.
>
> Robert



Re: Daily content sync between environments

Posted by Robert Munteanu <ro...@apache.org>.
Hi Roy,

On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
> Hey guys,
> 
> I didn't know for sure where to put this question so thats why I am
> asking it in both Jackrabbit as in Sling, because both might have
> their own solution / proposal, and I have seen solutions that are
> pure Jackrabbit but some that also require Sling.
> 
> I was wondering about what some of you might achieve something like a
> daily content sync between different environments (Production to
> Acceptance/Development for example).

Never tried this, but maybe you can use vlt ci/co? Initially you run
vlt co against the 'source' instance and then vlt ci against the
'target' instance. First time you will sync everything, but then
operations will be incremental.

You do have to find a clean way of switching the vlt url between
invocations though.

Robert

Re: Daily content sync between environments

Posted by Davide Giannella <da...@apache.org>.
On 22/10/2017 19:39, Roy Teeuwen wrote:
>
>   * One could create a JCR content package of production, and copy it
>     over to the different environments daily. The downside of this is
>     that when the package becomes big, it gets problematic (for
>     example when a lot of images/videos start to get involved). 
>   * One could use VLT RCP, I have tried this but it seems that it has
>     a bug[1] that when there is an error in the vlt rcp, it actually
>     retries infinite amount to fix it, making the actual sync
>     impossible from the moment one error for a specific node pops up.
>     A second problem, and also sort of a dealbreaker is that for VLT
>     RCP to work, you would have to make the production environment
>     network accessible to all the other environments, or a bit better,
>     use some proxy that has access to both production and the test
>     environments.
>   * Time Warner seems to have run into the same issue, and created
>     Grabbit[2] for this, but their tool has also two mayor drawbacks.
>     The first drawback is related to the VLT RCP, where you would need
>     to have access from production to all the other environments, but
>     even worse is that you can't do it with a proxy in between,
>     meaning they really need to have access to each other. A second
>     drawback that I really dislike is that their tool is written in
>     Spring OSGi. I have no idea why one would make a tool in OSGi and
>     then use Spring for it, but they bring in like 30 bundles just to
>     do a content sync between environments, mainly outdated Spring 3.X
>     bundles.
>   * Lastly of course there is the oak migration tool, which is super
>     fast in doing an export and import, it is the current one we use,
>     but the drawback being that you have to bring down the environment.
>

Hmmm, so off the top of my head I would say.

vlt rcp is a very good candidate; but as you said it requires access
rights between production and the other environments. I used it already
more than once to sycn content around and it works fairly nicely without
sucking too many resources. Don't know about the current status as I
used it many years ago last time but you could have some observation
that collects the paths that are changed, craft a list and then an agent
that pull such list for moving content around. Good bit is that it
doesn't require the instance to be shut down.

Content package are good candidates as well. Same approach as above
where you craft a list of paths you want to include and then you build a
package and download it from somewhere. Cons are for example that
content package build will probably tend to be more resource intensive
than vlt rcp.

Finally, another approach I used, but it was AEM, was to leverage the
sling replication to distribute content around as it was being made public.

The big question mark I have here is that I don't know how currently
you're making some content live.

Another aspect you may look at it, which is orthogonal to the above, is
the deployment topology. You may opt for a mongo clustered deployment,
where you have mongo keeping in sync with it's own replication mechanism
to a non-live cluster. Then you can rely on mongo backups to move
content around. You'll have a production copy to work with, by being an
additional node, it could be outside of live traffic not impacting
therefore the performance, you have a place where in case of disaster
could act as recovery.

HTH
Davide



Re: Daily content sync between environments

Posted by Robert Munteanu <ro...@apache.org>.
Hi Roy,

On Sun, 2017-10-22 at 19:39 +0200, Roy Teeuwen wrote:
> Hey guys,
> 
> I didn't know for sure where to put this question so thats why I am
> asking it in both Jackrabbit as in Sling, because both might have
> their own solution / proposal, and I have seen solutions that are
> pure Jackrabbit but some that also require Sling.
> 
> I was wondering about what some of you might achieve something like a
> daily content sync between different environments (Production to
> Acceptance/Development for example).

Never tried this, but maybe you can use vlt ci/co? Initially you run
vlt co against the 'source' instance and then vlt ci against the
'target' instance. First time you will sync everything, but then
operations will be incremental.

You do have to find a clean way of switching the vlt url between
invocations though.

Robert