You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Mario Kevo (Jira)" <ji...@apache.org> on 2022/09/21 06:05:00 UTC
[jira] [Resolved] (GEODE-10395) TXLockIdImpl memory leak after CommitConflictException from another node
[ https://issues.apache.org/jira/browse/GEODE-10395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mario Kevo resolved GEODE-10395.
--------------------------------
Fix Version/s: 1.16.0
Resolution: Fixed
> TXLockIdImpl memory leak after CommitConflictException from another node
> ------------------------------------------------------------------------
>
> Key: GEODE-10395
> URL: https://issues.apache.org/jira/browse/GEODE-10395
> Project: Geode
> Issue Type: Bug
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Eugene Nedzvetsky
> Assignee: Mario Kevo
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.16.0
>
>
> org.apache.geode.internal.cache.locks.TXLockServiceImpl#txLock:120 adds TXLockIdImpl objects to TXLockServiceImpl#txLockIdList.
> {code:java}
> synchronized (txLockIdList) {
> txLockId = new TXLockIdImpl(dlock.getDistributionManager().getId());
> txLockIdList.add(txLockId);
> }
> {code}
> These objects will not be removed from this list if dlock.acquireTryLocks returned false.
> {code:java}
> gotLocks = dlock.acquireTryLocks(batch, TIMEOUT_MILLIS, LEASE_MILLIS, keyIfFail);
> if (gotLocks) { // ...otherwise race can occur between tryLocks and readLock
> acquireRecoveryReadLock();
> } else if (keyIfFail[0] != null) {
> throw new CommitConflictException(
> String.format("Concurrent transaction commit detected %s",
> keyIfFail[0]));
> } else {
> throw new CommitConflictException(
> String.format("Failed to request try locks from grantor: %s",
> dlock.getLockGrantorId()));
> }
> {code}
> It throws CommitConflictException and after that system doesn't have a txLockId reference and this txLockId will be never removed from this list.
> It produces critical performance degradation. txLockIdList has a few hundred thousand txLocks after a few weeks and TXLockServiceImpl#release iterates this list 2 times on every tx commit and the same time "synchronized (txLockIdList)" locks other threads.
> TXLockIdImpl#equals works really slow because it checks bunch of variables in memberId.equals(that.memberId).
> {code:java}
> public void release(TXLockId txLockId) {
> synchronized (txLockIdList) {
> if (!txLockIdList.contains(txLockId)) {
> // TXLockService.destroyServices can be invoked in cache.close().
> // Other P2P threads could process message such as TXCommitMessage afterwards,
> // and invoke TXLockService.createDTLS(). It could create a new TXLockService
> // which will have a new empty list (txLockIdList) and it will not
> // contain the originally added txLockId
> throw new IllegalArgumentException(
> String.format("Invalid txLockId not found: %s",
> txLockId));
> }
> dlock.releaseTryLocks(txLockId, () -> {
> return recovering;
> });
> txLockIdList.remove(txLockId);
> releaseRecoveryReadLock();
> }
> }
> {code}
> I think TXLockServiceImpl#txLock should remove this txLockId from TXLockServiceImpl#txLockIdList in case of CommitConflictException:
> {code:java}
> if (gotLocks) { // ...otherwise race can occur between tryLocks and readLock
> acquireRecoveryReadLock();
> } else if (keyIfFail[0] != null) {
> synchronized (this.txLockIdList) {
> this.txLockIdList.remove(txLockId);
> }
> throw new CommitConflictException(
> String.format("Concurrent transaction commit detected %s",
> keyIfFail[0]));
> } else {
> synchronized (this.txLockIdList) {
> this.txLockIdList.remove(txLockId);
> }
> throw new CommitConflictException(
> String.format("Failed to request try locks from grantor: %s",
> this.dlock.getLockGrantorId()));
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)