You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Bruce Schuchardt <bs...@pivotal.io> on 2016/10/25 16:13:23 UTC

Review Request 53150: GEODE-2024 Deadlock creating a new distributed lock service Grantor while transactions are in progress

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53150/
-----------------------------------------------------------

Review request for geode, Darrel Schneider, Kirk Lund, and Udo Kohlmeyer.


Bugs: GEODE-2024
    https://issues.apache.org/jira/browse/GEODE-2024


Repository: geode


Description
-------

This change-set causes the code in TXLockServiceImpl.release() to perform periodic checks to see if grantor recovery is being performed.  If so it skips releaseTryLocks, which requires a recovered grantor to function.  This is in line with the previous attempts to fix this problem.  The recovery message that is trying to obtain the recovery write-lock now sets the "recovering" state in TXLockServiceImpl prior to attempting to get the lock so that it is set when TXLockServiceImpl.release() checks its state.


Diffs
-----

  geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockRecoverGrantorProcessor.java 37fbfbe978437db15ba1d3439b28fdcc7eb0cee8 
  geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockService.java a859299a3581939740f6c62cc759179bfdd22ee1 
  geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXLockServiceImpl.java f4ab02feb67490da03209d8953da30c0415e4677 
  geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXRecoverGrantorMessageProcessor.java 77dec946d920bf4f6d3f732c9d68f7c94bd7ea01 
  geode-core/src/test/java/org/apache/geode/internal/cache/locks/TXLockServiceDUnitTest.java fb16ea9e50da205322b17a9f46d6153729bf3850 

Diff: https://reviews.apache.org/r/53150/diff/


Testing
-------

TXLockServiceDUnitTest had a test for previous occurences of this problem but it was marked @Ignore.  I've fixed the timing issues with that test but it wasn't quite modeling what happened in the GEODE-2024 failure so I added another test for grantor fail-over.  The new test ensures that another member is the grantor prior to obtaining TX locks.  Then it initiates grantor failover and tells the TXLockService to release its locks.


Thanks,

Bruce Schuchardt


Re: Review Request 53150: GEODE-2024 Deadlock creating a new distributed lock service Grantor while transactions are in progress

Posted by Kirk Lund <ki...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53150/#review153940
-----------------------------------------------------------


Ship it!




Ship It!

- Kirk Lund


On Oct. 25, 2016, 4:13 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53150/
> -----------------------------------------------------------
> 
> (Updated Oct. 25, 2016, 4:13 p.m.)
> 
> 
> Review request for geode, Darrel Schneider, Kirk Lund, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-2024
>     https://issues.apache.org/jira/browse/GEODE-2024
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This change-set causes the code in TXLockServiceImpl.release() to perform periodic checks to see if grantor recovery is being performed.  If so it skips releaseTryLocks, which requires a recovered grantor to function.  This is in line with the previous attempts to fix this problem.  The recovery message that is trying to obtain the recovery write-lock now sets the "recovering" state in TXLockServiceImpl prior to attempting to get the lock so that it is set when TXLockServiceImpl.release() checks its state.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockRecoverGrantorProcessor.java 37fbfbe978437db15ba1d3439b28fdcc7eb0cee8 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockService.java a859299a3581939740f6c62cc759179bfdd22ee1 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXLockServiceImpl.java f4ab02feb67490da03209d8953da30c0415e4677 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXRecoverGrantorMessageProcessor.java 77dec946d920bf4f6d3f732c9d68f7c94bd7ea01 
>   geode-core/src/test/java/org/apache/geode/internal/cache/locks/TXLockServiceDUnitTest.java fb16ea9e50da205322b17a9f46d6153729bf3850 
> 
> Diff: https://reviews.apache.org/r/53150/diff/
> 
> 
> Testing
> -------
> 
> TXLockServiceDUnitTest had a test for previous occurences of this problem but it was marked @Ignore.  I've fixed the timing issues with that test but it wasn't quite modeling what happened in the GEODE-2024 failure so I added another test for grantor fail-over.  The new test ensures that another member is the grantor prior to obtaining TX locks.  Then it initiates grantor failover and tells the TXLockService to release its locks.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Re: Review Request 53150: GEODE-2024 Deadlock creating a new distributed lock service Grantor while transactions are in progress

Posted by Darrel Schneider <ds...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53150/#review153944
-----------------------------------------------------------


Ship it!




Ship It!

- Darrel Schneider


On Oct. 25, 2016, 9:13 a.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53150/
> -----------------------------------------------------------
> 
> (Updated Oct. 25, 2016, 9:13 a.m.)
> 
> 
> Review request for geode, Darrel Schneider, Kirk Lund, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-2024
>     https://issues.apache.org/jira/browse/GEODE-2024
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This change-set causes the code in TXLockServiceImpl.release() to perform periodic checks to see if grantor recovery is being performed.  If so it skips releaseTryLocks, which requires a recovered grantor to function.  This is in line with the previous attempts to fix this problem.  The recovery message that is trying to obtain the recovery write-lock now sets the "recovering" state in TXLockServiceImpl prior to attempting to get the lock so that it is set when TXLockServiceImpl.release() checks its state.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockRecoverGrantorProcessor.java 37fbfbe978437db15ba1d3439b28fdcc7eb0cee8 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockService.java a859299a3581939740f6c62cc759179bfdd22ee1 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXLockServiceImpl.java f4ab02feb67490da03209d8953da30c0415e4677 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXRecoverGrantorMessageProcessor.java 77dec946d920bf4f6d3f732c9d68f7c94bd7ea01 
>   geode-core/src/test/java/org/apache/geode/internal/cache/locks/TXLockServiceDUnitTest.java fb16ea9e50da205322b17a9f46d6153729bf3850 
> 
> Diff: https://reviews.apache.org/r/53150/diff/
> 
> 
> Testing
> -------
> 
> TXLockServiceDUnitTest had a test for previous occurences of this problem but it was marked @Ignore.  I've fixed the timing issues with that test but it wasn't quite modeling what happened in the GEODE-2024 failure so I added another test for grantor fail-over.  The new test ensures that another member is the grantor prior to obtaining TX locks.  Then it initiates grantor failover and tells the TXLockService to release its locks.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Re: Review Request 53150: GEODE-2024 Deadlock creating a new distributed lock service Grantor while transactions are in progress

Posted by Udo Kohlmeyer <uk...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53150/#review153925
-----------------------------------------------------------


Ship it!




Ship It!

- Udo Kohlmeyer


On Oct. 25, 2016, 4:13 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53150/
> -----------------------------------------------------------
> 
> (Updated Oct. 25, 2016, 4:13 p.m.)
> 
> 
> Review request for geode, Darrel Schneider, Kirk Lund, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-2024
>     https://issues.apache.org/jira/browse/GEODE-2024
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This change-set causes the code in TXLockServiceImpl.release() to perform periodic checks to see if grantor recovery is being performed.  If so it skips releaseTryLocks, which requires a recovered grantor to function.  This is in line with the previous attempts to fix this problem.  The recovery message that is trying to obtain the recovery write-lock now sets the "recovering" state in TXLockServiceImpl prior to attempting to get the lock so that it is set when TXLockServiceImpl.release() checks its state.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockRecoverGrantorProcessor.java 37fbfbe978437db15ba1d3439b28fdcc7eb0cee8 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockService.java a859299a3581939740f6c62cc759179bfdd22ee1 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXLockServiceImpl.java f4ab02feb67490da03209d8953da30c0415e4677 
>   geode-core/src/main/java/org/apache/geode/internal/cache/locks/TXRecoverGrantorMessageProcessor.java 77dec946d920bf4f6d3f732c9d68f7c94bd7ea01 
>   geode-core/src/test/java/org/apache/geode/internal/cache/locks/TXLockServiceDUnitTest.java fb16ea9e50da205322b17a9f46d6153729bf3850 
> 
> Diff: https://reviews.apache.org/r/53150/diff/
> 
> 
> Testing
> -------
> 
> TXLockServiceDUnitTest had a test for previous occurences of this problem but it was marked @Ignore.  I've fixed the timing issues with that test but it wasn't quite modeling what happened in the GEODE-2024 failure so I added another test for grantor fail-over.  The new test ensures that another member is the grantor prior to obtaining TX locks.  Then it initiates grantor failover and tells the TXLockService to release its locks.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>