You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Stefan Egli <st...@apache.org> on 2016/01/27 10:45:17 UTC

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Hi,

Following up on the OAK-3672 discussion again, and taking a step back, I
see three possible classes of solutions:

a) the (cluster)id is always defined by discovery-lite, be it cluster or
singlevm
b) the (cluster)id is entirely removed and it is up to discovery.oak (in
sling) to define it
c) the (cluster)id is only set by discovery-lite when feasible, eg only
for the cluster case

I'm in favour of c) with the following arguments:
* a) requires tarMk (!) to store this id somewhere. It can either store it
in the filesystem (which makes failover support harder), store it as a
hidden property in the node store (which is not manageable as it's hidden)
or store it as a normal property in the repository (which sounds hacky, as
discovery-lite is in the NodeStore layer while this would require it to
simulate writing a JCR property)
* removing the id altogether (b) would be going too far imv: the logical
unit that defines the cluster view (its members) is the best place to also
define an id for that unit. And that logical unit is discovery-lite in
this case.
* what speaks for returning null for the singleVm case (c) is the fact
that it is a special case (it is not a cluster). So treating the special
case separately doesn't break the separation of concern rule in my view.
(c) would imply that the id is set when we're in a cluster case, and not
otherwise (but that would not be a hard requirement, the specification
would just be that the id *can* be null).

So long story short: I suggest to change the definition of this id so that
it *can* be null - in which case upper layers must define their own id.
Which means Sling's discovery.oak would then store a clusterId under
/var/discovery/oak. That would automatically support cold-standby/failover
- fix the original bug - and simplify cleaning this property up for the
clone case (as that would correspond to how this case was dealt with in
discovery.impl times already).

WDYT?

Cheers,
Stefan

On 26/11/15 11:32, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>On Thu, Nov 26, 2015 at 3:56 PM, Stefan Egli <eg...@adobe.com> wrote:
>> which would
>> then be on the Sling level thus could more simply use the slingId.
>
>That also sounds good. While we are at it also have a look at OAK-3529
>where system needs to know a clusterId. Looks like some overlap so
>keep that usecase also in mind
>
>
>Chetan Mehrotra

Re: [discuss] persisting cluster (view) id for discovery-lite-descriptor

Posted by Stefan Egli <st...@apache.org>.

Having thought and discussed about this some more.. an even simpler
solution is:

d) the discovery-lite descriptor *can* contain an id, in which case it
should be used. But *neither tarMk nor mongoMk set this*.

+ The advantage is that tarMk and mongoMk then behave the same, and even
the similar to discovery.impl: discovery.oak stores a 'clusterId' property
under /var/discovery/oak, thus being easily visible/manageable in all
cases.

- The disadvantages are in the same area that lead to choosing c)
originally: conceptually defining the id and who is member etc are the all
aspects of the same concern and should not be separated, as otherwise you
open the door for possible inconsistencies of these aspects. So if this is
separated it needs to be seen as a trade-off with what is gained, namely
easier visibility and manageability of this id. Known places where this
separation and thus loss of synchronization can be a problem is the first
time the id is defined. That should however be handled by mongoMk's
conflict handling. Another potential place is when this id is redefined
(eg deleted). That must be managed separately and is one consequence of d)
versus c). At this stage I'm not seeing any other negative consequences so
overall d) sounds still better than c).

Unless I hear vetoes, I'd implement this change before tomorrow's 1.3.15
release (also in OAK-3672, which I'll then rename)

Cheers,
Stefan

On 27/01/16 10:45, "Stefan Egli" <st...@apache.org> wrote:

>Hi,
>
>Following up on the OAK-3672 discussion again, and taking a step back, I
>see three possible classes of solutions:
>
>a) the (cluster)id is always defined by discovery-lite, be it cluster or
>singlevm
>b) the (cluster)id is entirely removed and it is up to discovery.oak (in
>sling) to define it
>c) the (cluster)id is only set by discovery-lite when feasible, eg only
>for the cluster case
>
>I'm in favour of c) with the following arguments:
>* a) requires tarMk (!) to store this id somewhere. It can either store it
>in the filesystem (which makes failover support harder), store it as a
>hidden property in the node store (which is not manageable as it's hidden)
>or store it as a normal property in the repository (which sounds hacky, as
>discovery-lite is in the NodeStore layer while this would require it to
>simulate writing a JCR property)
>* removing the id altogether (b) would be going too far imv: the logical
>unit that defines the cluster view (its members) is the best place to also
>define an id for that unit. And that logical unit is discovery-lite in
>this case.
>* what speaks for returning null for the singleVm case (c) is the fact
>that it is a special case (it is not a cluster). So treating the special
>case separately doesn't break the separation of concern rule in my view.
>(c) would imply that the id is set when we're in a cluster case, and not
>otherwise (but that would not be a hard requirement, the specification
>would just be that the id *can* be null).
>
>So long story short: I suggest to change the definition of this id so that
>it *can* be null - in which case upper layers must define their own id.
>Which means Sling's discovery.oak would then store a clusterId under
>/var/discovery/oak. That would automatically support cold-standby/failover
>- fix the original bug - and simplify cleaning this property up for the
>clone case (as that would correspond to how this case was dealt with in
>discovery.impl times already).
>
>WDYT?
>
>Cheers,
>Stefan
>
>On 26/11/15 11:32, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>On Thu, Nov 26, 2015 at 3:56 PM, Stefan Egli <eg...@adobe.com> wrote:
>>> which would
>>> then be on the Sling level thus could more simply use the slingId.
>>
>>That also sounds good. While we are at it also have a look at OAK-3529
>>where system needs to know a clusterId. Looks like some overlap so
>>keep that usecase also in mind
>>
>>
>>Chetan Mehrotra
>
>