You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Jinmei Liao <ji...@vmware.com> on 2020/05/26 20:02:32 UTC

[PROPOSAL] make Cluster Management Service CRUD operations thread safe

Hi, Geode Community,

Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:

https://cwiki.apache.org/confluence/display/GEODE/Make+Cluster+Management+Service%28CMS%29+Thread+Safe

Comments/suggestions welcome.

Jinmei

Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Posted by Jinmei Liao <ji...@vmware.com>.
Donal had a good comment on the wiki page, and I replied. Maybe that would help answer your question of ID:

https://cwiki.apache.org/confluence/display/GEODE/Make+Cluster+Management+Service(CMS)+Thread+Safe
________________________________
From: Anilkumar Gingade <ag...@vmware.com>
Sent: Thursday, May 28, 2020 12:02 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Yes, the DLock machinery handles (has option) dlock grantor departure...

As I understand, right now we have dlock at config persistence layer, but this does not guarantee preserving the order in which the config changes are applied. E.g.: A create region command followed by destroy could be persisted in reverse order.

And the proposal here is to move the lock from persist layer to higher level to preserve the ordering. In this case the cost of taking Dlock remains same, except the lock window is increased to much higher.
Question is; Is the window so high that it impacts the user experience; considering config changes are not often or less concurrent (multiple clients changing/updating the config).
By having one single locking scheme, the logic remains simple, stable and manageable.

>> " Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element"
Is this possible for all config commands...What is "ID" refers to, is it region name (region level ops); if so then, does this need parsing the command request or is it already available? E.g. create index

>> not sure what's the cost of creating a dlock
The cost depends on who is the dlock grantor? If create request is on the grantor itself, its cheaper...If it’s a peer node, than the cost is sending the lock request message to the grantor. The cost is with sending message to the Grantor, in most cases. Which is not bad, considering the configuration does not change frequently.

-Anil.

On 5/28/20, 11:08 AM, "Jinmei Liao" <ji...@vmware.com> wrote:

    Simultaneous updates to configurations are already protected by a different dlock, so I assume they can be made safely.

    Typically a CMS operation involves two parts:

    1) updates to the servers to "realize" the configuration
    2) updates to the configurations to "persist" it.

    The purpose of the CMS level dlock is to make these two parts atomic, if a create/delete operations of the same element happened not in an atomic fashion, we would end up with inconsistent state between what's persisted and what's realized.

    I believe the dlock can be configured to expire after a period of time.
    ________________________________
    From: Anthony Baker <ba...@vmware.com>
    Sent: Thursday, May 28, 2020 10:40 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

    I think the first question to answer is:  can simultaneous updates to configuration be made safely?  Or what is the critical section of code that needs to be protected?

    Another thing to consider with dlocks is what happens in the failure case when the lock is not properly released.  Does it continue to block all management /configuration operations?  Does it expire after a period of time?

    Anthony


    > On May 28, 2020, at 10:17 AM, Jinmei Liao <ji...@vmware.com> wrote:
    >
    > The proposal is proposing using ONE single dlock to synchronize all CMS CRUD operations, that means, in a given time, only one CRUD operation in CMS is allowed in the entire cluster, this seems a bit too harsh.
    >
    > Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element. That way, we allow more concurrency. I am just not sure what's the cost of creating a dlock. Is the the cost of creating a dlock per ID warrants the performance gain?
    >
    > Comment/Suggestions?
    >
    > Jinmei
    > ________________________________
    > From: Jinmei Liao <ji...@vmware.com>
    > Sent: Tuesday, May 26, 2020 1:02 PM
    > To: dev@geode.apache.org <de...@geode.apache.org>
    > Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe
    >
    > Hi, Geode Community,
    >
    > Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:
    >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafe&amp;data=02%7C01%7Cjiliao%40vmware.com%7C32464d1d0bb74930fe7508d8033a15bc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262895297900183&amp;sdata=dBuD9e4FaqAe74Xf0fn%2FdJBMlMbOAXWcx0wi0tCFNfY%3D&amp;reserved=0
    >
    > Comments/suggestions welcome.
    >
    > Jinmei



Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Posted by Anilkumar Gingade <ag...@vmware.com>.
Yes, the DLock machinery handles (has option) dlock grantor departure...

As I understand, right now we have dlock at config persistence layer, but this does not guarantee preserving the order in which the config changes are applied. E.g.: A create region command followed by destroy could be persisted in reverse order.

And the proposal here is to move the lock from persist layer to higher level to preserve the ordering. In this case the cost of taking Dlock remains same, except the lock window is increased to much higher.
Question is; Is the window so high that it impacts the user experience; considering config changes are not often or less concurrent (multiple clients changing/updating the config). 
By having one single locking scheme, the logic remains simple, stable and manageable.
 
>> " Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element"
Is this possible for all config commands...What is "ID" refers to, is it region name (region level ops); if so then, does this need parsing the command request or is it already available? E.g. create index 
 
>> not sure what's the cost of creating a dlock
The cost depends on who is the dlock grantor? If create request is on the grantor itself, its cheaper...If it’s a peer node, than the cost is sending the lock request message to the grantor. The cost is with sending message to the Grantor, in most cases. Which is not bad, considering the configuration does not change frequently. 

-Anil.

On 5/28/20, 11:08 AM, "Jinmei Liao" <ji...@vmware.com> wrote:

    Simultaneous updates to configurations are already protected by a different dlock, so I assume they can be made safely.

    Typically a CMS operation involves two parts:

    1) updates to the servers to "realize" the configuration
    2) updates to the configurations to "persist" it.

    The purpose of the CMS level dlock is to make these two parts atomic, if a create/delete operations of the same element happened not in an atomic fashion, we would end up with inconsistent state between what's persisted and what's realized.

    I believe the dlock can be configured to expire after a period of time.
    ________________________________
    From: Anthony Baker <ba...@vmware.com>
    Sent: Thursday, May 28, 2020 10:40 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

    I think the first question to answer is:  can simultaneous updates to configuration be made safely?  Or what is the critical section of code that needs to be protected?

    Another thing to consider with dlocks is what happens in the failure case when the lock is not properly released.  Does it continue to block all management /configuration operations?  Does it expire after a period of time?

    Anthony


    > On May 28, 2020, at 10:17 AM, Jinmei Liao <ji...@vmware.com> wrote:
    >
    > The proposal is proposing using ONE single dlock to synchronize all CMS CRUD operations, that means, in a given time, only one CRUD operation in CMS is allowed in the entire cluster, this seems a bit too harsh.
    >
    > Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element. That way, we allow more concurrency. I am just not sure what's the cost of creating a dlock. Is the the cost of creating a dlock per ID warrants the performance gain?
    >
    > Comment/Suggestions?
    >
    > Jinmei
    > ________________________________
    > From: Jinmei Liao <ji...@vmware.com>
    > Sent: Tuesday, May 26, 2020 1:02 PM
    > To: dev@geode.apache.org <de...@geode.apache.org>
    > Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe
    >
    > Hi, Geode Community,
    >
    > Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:
    >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafe&amp;data=02%7C01%7Cagingade%40vmware.com%7Cebd6b928654b4946771e08d803322884%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262861250474025&amp;sdata=24psRmhotSCG%2BzjULBohdcyyJ3y%2FsWKii99ipdPFGRE%3D&amp;reserved=0
    >
    > Comments/suggestions welcome.
    >
    > Jinmei



Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Posted by Jinmei Liao <ji...@vmware.com>.
Simultaneous updates to configurations are already protected by a different dlock, so I assume they can be made safely.

Typically a CMS operation involves two parts:

1) updates to the servers to "realize" the configuration
2) updates to the configurations to "persist" it.

The purpose of the CMS level dlock is to make these two parts atomic, if a create/delete operations of the same element happened not in an atomic fashion, we would end up with inconsistent state between what's persisted and what's realized.

I believe the dlock can be configured to expire after a period of time.
________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Thursday, May 28, 2020 10:40 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

I think the first question to answer is:  can simultaneous updates to configuration be made safely?  Or what is the critical section of code that needs to be protected?

Another thing to consider with dlocks is what happens in the failure case when the lock is not properly released.  Does it continue to block all management /configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao <ji...@vmware.com> wrote:
>
> The proposal is proposing using ONE single dlock to synchronize all CMS CRUD operations, that means, in a given time, only one CRUD operation in CMS is allowed in the entire cluster, this seems a bit too harsh.
>
> Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element. That way, we allow more concurrency. I am just not sure what's the cost of creating a dlock. Is the the cost of creating a dlock per ID warrants the performance gain?
>
> Comment/Suggestions?
>
> Jinmei
> ________________________________
> From: Jinmei Liao <ji...@vmware.com>
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe
>
> Hi, Geode Community,
>
> Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafe&amp;data=02%7C01%7Cjiliao%40vmware.com%7C63f2edc4f0dd43257ae608d8032e4c41%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262844668586785&amp;sdata=5fj2V6IWdKjkIpEq5pUL86IkAvPKS5QYSHPuy3qlFoE%3D&amp;reserved=0
>
> Comments/suggestions welcome.
>
> Jinmei


Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Posted by Anthony Baker <ba...@vmware.com>.
I think the first question to answer is:  can simultaneous updates to configuration be made safely?  Or what is the critical section of code that needs to be protected?

Another thing to consider with dlocks is what happens in the failure case when the lock is not properly released.  Does it continue to block all management /configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao <ji...@vmware.com> wrote:
> 
> The proposal is proposing using ONE single dlock to synchronize all CMS CRUD operations, that means, in a given time, only one CRUD operation in CMS is allowed in the entire cluster, this seems a bit too harsh.
> 
> Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element. That way, we allow more concurrency. I am just not sure what's the cost of creating a dlock. Is the the cost of creating a dlock per ID warrants the performance gain?
> 
> Comment/Suggestions?
> 
> Jinmei
> ________________________________
> From: Jinmei Liao <ji...@vmware.com>
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe
> 
> Hi, Geode Community,
> 
> Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafe&amp;data=02%7C01%7Cbakera%40vmware.com%7C08663df2f1c846dddaaf08d8032d9d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262841746056995&amp;sdata=eayoO7sOyyFgSsOP%2FXiN2U1L5YTbbukmBVkVmXtR3jI%3D&amp;reserved=0
> 
> Comments/suggestions welcome.
> 
> Jinmei


Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Posted by Jinmei Liao <ji...@vmware.com>.
The proposal is proposing using ONE single dlock to synchronize all CMS CRUD operations, that means, in a given time, only one CRUD operation in CMS is allowed in the entire cluster, this seems a bit too harsh.

Another way is to use a dlock per ID to only synchronize CRUD operation on the same ID element. That way, we allow more concurrency. I am just not sure what's the cost of creating a dlock. Is the the cost of creating a dlock per ID warrants the performance gain?

Comment/Suggestions?

Jinmei
________________________________
From: Jinmei Liao <ji...@vmware.com>
Sent: Tuesday, May 26, 2020 1:02 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Hi, Geode Community,

Currently, the CMS CRUD operations are not thread safe, if one call tries to create a region, and another call tries to delete the same region, if timing is off, we could end up with inconsistent state (what's in cluster config and what's actually on the server). So we should make these operations thread safe. Here is the proposal to try to achieve it:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafe&amp;data=02%7C01%7Cjiliao%40vmware.com%7C06dd93ddb8f947b017cd08d80325d7cb%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262808357520050&amp;sdata=l0fbTN%2Fb1QK%2F8fnNHYbFQ6CtywAgKZpTE0Jv33ab%2BUk%3D&amp;reserved=0

Comments/suggestions welcome.

Jinmei