You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Stefan Egli <st...@apache.org> on 2015/11/25 13:30:17 UTC

[discuss] persisting cluster (view) id for discovery-lite-descriptor

Hi,

Noticed that for TarMK the discovery-lite-descriptor does currently not
persist the cluster-view-id [0]. It should do this however, as otherwise
this causes upper-level discovery.oak to break the discovery API, as it
demands a persisted cluster id. (Note that this id is not to be confused
with the 'cluster node id' that identifies an instance within a document
node store cluster)

I wanted to get some ideas from the list as to how this should be
implemented. Current options are:
1. storing a 'cluster.id.file' (or 'discovery.cluster.id.file') similar to
the 'sling.id.file' (via BundleContext.getDataFile).
> * cloning a repository would therefore require to delete both sling.id.file
> and this new file
> * disadvantage: cold standby would require an explicit copying of this file
> (during initial hand-shake?)
2. storing the id as a property somewhere in the repository.
> * disadvantage: cloning a repository would clone this id as well and there
> might not be an easy enough way for a user to reset it
Opinions? Alternatives?

Cheers,
Stefan
--
[0] https://issues.apache.org/jira/browse/OAK-3672

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Stefan Egli <st...@apache.org>.

Having thought and discussed about this some more.. an even simpler
solution is:

d) the discovery-lite descriptor *can* contain an id, in which case it
should be used. But *neither tarMk nor mongoMk set this*.

+ The advantage is that tarMk and mongoMk then behave the same, and even
the similar to discovery.impl: discovery.oak stores a 'clusterId' property
under /var/discovery/oak, thus being easily visible/manageable in all
cases.

- The disadvantages are in the same area that lead to choosing c)
originally: conceptually defining the id and who is member etc are the all
aspects of the same concern and should not be separated, as otherwise you
open the door for possible inconsistencies of these aspects. So if this is
separated it needs to be seen as a trade-off with what is gained, namely
easier visibility and manageability of this id. Known places where this
separation and thus loss of synchronization can be a problem is the first
time the id is defined. That should however be handled by mongoMk's
conflict handling. Another potential place is when this id is redefined
(eg deleted). That must be managed separately and is one consequence of d)
versus c). At this stage I'm not seeing any other negative consequences so
overall d) sounds still better than c).

Unless I hear vetoes, I'd implement this change before tomorrow's 1.3.15
release (also in OAK-3672, which I'll then rename)

Cheers,
Stefan

On 27/01/16 10:45, "Stefan Egli" <st...@apache.org> wrote:

>Hi,
>
>Following up on the OAK-3672 discussion again, and taking a step back, I
>see three possible classes of solutions:
>
>a) the (cluster)id is always defined by discovery-lite, be it cluster or
>singlevm
>b) the (cluster)id is entirely removed and it is up to discovery.oak (in
>sling) to define it
>c) the (cluster)id is only set by discovery-lite when feasible, eg only
>for the cluster case
>
>I'm in favour of c) with the following arguments:
>* a) requires tarMk (!) to store this id somewhere. It can either store it
>in the filesystem (which makes failover support harder), store it as a
>hidden property in the node store (which is not manageable as it's hidden)
>or store it as a normal property in the repository (which sounds hacky, as
>discovery-lite is in the NodeStore layer while this would require it to
>simulate writing a JCR property)
>* removing the id altogether (b) would be going too far imv: the logical
>unit that defines the cluster view (its members) is the best place to also
>define an id for that unit. And that logical unit is discovery-lite in
>this case.
>* what speaks for returning null for the singleVm case (c) is the fact
>that it is a special case (it is not a cluster). So treating the special
>case separately doesn't break the separation of concern rule in my view.
>(c) would imply that the id is set when we're in a cluster case, and not
>otherwise (but that would not be a hard requirement, the specification
>would just be that the id *can* be null).
>
>So long story short: I suggest to change the definition of this id so that
>it *can* be null - in which case upper layers must define their own id.
>Which means Sling's discovery.oak would then store a clusterId under
>/var/discovery/oak. That would automatically support cold-standby/failover
>- fix the original bug - and simplify cleaning this property up for the
>clone case (as that would correspond to how this case was dealt with in
>discovery.impl times already).
>
>WDYT?
>
>Cheers,
>Stefan
>
>On 26/11/15 11:32, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>On Thu, Nov 26, 2015 at 3:56 PM, Stefan Egli <eg...@adobe.com> wrote:
>>> which would
>>> then be on the Sling level thus could more simply use the slingId.
>>
>>That also sounds good. While we are at it also have a look at OAK-3529
>>where system needs to know a clusterId. Looks like some overlap so
>>keep that usecase also in mind
>>
>>
>>Chetan Mehrotra
>
>

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Stefan Egli <st...@apache.org>.

Hi,

Following up on the OAK-3672 discussion again, and taking a step back, I
see three possible classes of solutions:

a) the (cluster)id is always defined by discovery-lite, be it cluster or
singlevm
b) the (cluster)id is entirely removed and it is up to discovery.oak (in
sling) to define it
c) the (cluster)id is only set by discovery-lite when feasible, eg only
for the cluster case

I'm in favour of c) with the following arguments:
* a) requires tarMk (!) to store this id somewhere. It can either store it
in the filesystem (which makes failover support harder), store it as a
hidden property in the node store (which is not manageable as it's hidden)
or store it as a normal property in the repository (which sounds hacky, as
discovery-lite is in the NodeStore layer while this would require it to
simulate writing a JCR property)
* removing the id altogether (b) would be going too far imv: the logical
unit that defines the cluster view (its members) is the best place to also
define an id for that unit. And that logical unit is discovery-lite in
this case.
* what speaks for returning null for the singleVm case (c) is the fact
that it is a special case (it is not a cluster). So treating the special
case separately doesn't break the separation of concern rule in my view.
(c) would imply that the id is set when we're in a cluster case, and not
otherwise (but that would not be a hard requirement, the specification
would just be that the id *can* be null).

So long story short: I suggest to change the definition of this id so that
it *can* be null - in which case upper layers must define their own id.
Which means Sling's discovery.oak would then store a clusterId under
/var/discovery/oak. That would automatically support cold-standby/failover
- fix the original bug - and simplify cleaning this property up for the
clone case (as that would correspond to how this case was dealt with in
discovery.impl times already).

WDYT?

Cheers,
Stefan

On 26/11/15 11:32, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>On Thu, Nov 26, 2015 at 3:56 PM, Stefan Egli <eg...@adobe.com> wrote:
>> which would
>> then be on the Sling level thus could more simply use the slingId.
>
>That also sounds good. While we are at it also have a look at OAK-3529
>where system needs to know a clusterId. Looks like some overlap so
>keep that usecase also in mind
>
>
>Chetan Mehrotra

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Chetan Mehrotra <ch...@gmail.com>.

On Thu, Nov 26, 2015 at 3:56 PM, Stefan Egli <eg...@adobe.com> wrote:
> which would
> then be on the Sling level thus could more simply use the slingId.

That also sounds good. While we are at it also have a look at OAK-3529
where system needs to know a clusterId. Looks like some overlap so
keep that usecase also in mind

Chetan Mehrotra

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Stefan Egli <eg...@adobe.com>.

I'm not sure how feasible kung fu or voodoo would be but one alternative
could be that discovery-lite would 'signal' that this is a standalone
instance (either by just setting id=null or by something a bit more
explicit) and discovery.oak could then react accordingly - which would
then be on the Sling level thus could more simply use the slingId.

Not sure about making the "discovery-lite API" weaker re this point
though...

Cheers,
Stefan

On 26/11/15 04:37, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>There is another option to avoid extra effort when running within
>Sling. Have an optional implementation which makes use of
>SlingSettingsService to get fetch SlingId. With little bit of OSGi
>kung fu you can have an implementation which uses SlingId when running
>in Sling otherwise maintains its own id using File based approach.
>
>This would reduce operational complexity
>Chetan Mehrotra
>
>
>On Wed, Nov 25, 2015 at 6:23 PM, Stefan Egli <st...@apache.org>
>wrote:
>> Right, I'm not sure it is indeed a requirement. But without automatic
>> support it might get forgotten and thus the cluster id would change upon
>> failover.
>>
>> Cheers,
>> Stefan
>>
>> On 25/11/15 13:40, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>>
>>>On Wed, Nov 25, 2015 at 6:00 PM, Stefan Egli <st...@apache.org>
>>>wrote:
>>>>> * disadvantage: cold standby would require an explicit copying of
>>>>>this
>>>>>file
>>>>> (during initial hand-shake?)
>>>
>>>Why is that a requirement? Cold standby is just a backup and currently
>>>there is no automatic failover support.
>>>
>>>For such cases we can allow passing the id as a system/framework
>>>property
>>>also
>>>
>>>Chetan Mehrotra
>>
>>

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Chetan Mehrotra <ch...@gmail.com>.

There is another option to avoid extra effort when running within
Sling. Have an optional implementation which makes use of
SlingSettingsService to get fetch SlingId. With little bit of OSGi
kung fu you can have an implementation which uses SlingId when running
in Sling otherwise maintains its own id using File based approach.

This would reduce operational complexity
Chetan Mehrotra


On Wed, Nov 25, 2015 at 6:23 PM, Stefan Egli <st...@apache.org> wrote:
> Right, I'm not sure it is indeed a requirement. But without automatic
> support it might get forgotten and thus the cluster id would change upon
> failover.
>
> Cheers,
> Stefan
>
> On 25/11/15 13:40, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>On Wed, Nov 25, 2015 at 6:00 PM, Stefan Egli <st...@apache.org>
>>wrote:
>>>> * disadvantage: cold standby would require an explicit copying of this
>>>>file
>>>> (during initial hand-shake?)
>>
>>Why is that a requirement? Cold standby is just a backup and currently
>>there is no automatic failover support.
>>
>>For such cases we can allow passing the id as a system/framework property
>>also
>>
>>Chetan Mehrotra
>
>

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Stefan Egli <st...@apache.org>.

Right, I'm not sure it is indeed a requirement. But without automatic
support it might get forgotten and thus the cluster id would change upon
failover.

Cheers,
Stefan

On 25/11/15 13:40, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>On Wed, Nov 25, 2015 at 6:00 PM, Stefan Egli <st...@apache.org>
>wrote:
>>> * disadvantage: cold standby would require an explicit copying of this
>>>file
>>> (during initial hand-shake?)
>
>Why is that a requirement? Cold standby is just a backup and currently
>there is no automatic failover support.
>
>For such cases we can allow passing the id as a system/framework property
>also
>
>Chetan Mehrotra

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Chetan Mehrotra <ch...@gmail.com>.

On Wed, Nov 25, 2015 at 6:00 PM, Stefan Egli <st...@apache.org> wrote:
>> * disadvantage: cold standby would require an explicit copying of this file
>> (during initial hand-shake?)

Why is that a requirement? Cold standby is just a backup and currently
there is no automatic failover support.

For such cases we can allow passing the id as a system/framework property also

Chetan Mehrotra