You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by "Joel Lang (JIRA)" <ji...@apache.org> on 2017/09/28 18:31:00 UTC

[jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Joel Lang created IGNITE-6530:
---------------------------------

             Summary: Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
                 Key: IGNITE-6530
                 URL: https://issues.apache.org/jira/browse/IGNITE-6530
             Project: Ignite
          Issue Type: Bug
          Components: persistence
    Affects Versions: 2.2
            Reporter: Joel Lang


Just started receiving a great deal of warnings about possible starvation in stripped pool.

The stack trace shows it happens in checkpointReadLock() in GridCacheDatabaseSharedManager.

Here are the log messages:

{noformat}
2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3212
Thread [name="sys-stripe-4-#5%mbe%", id=19, state=WAITING, blockCnt=12, waitCnt=5835]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-5-#6%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3524
Thread [name="sys-stripe-5-#6%mbe%", id=20, state=WAITING, blockCnt=3, waitCnt=6730]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-6-#7%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3074
Thread [name="sys-stripe-6-#7%mbe%", id=21, state=WAITING, blockCnt=1, waitCnt=5686]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-7-#8%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3249
Thread [name="sys-stripe-7-#8%mbe%", id=22, state=WAITING, blockCnt=4, waitCnt=5892]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Dmitriy.

Do you have a reproducer? minimal application that can reproduce this specific bug.

Can you, please, run a TC for your PR and get a TC bot visa?

> 25 февр. 2020 г., в 11:08, Dmitry Gorchakov <dm...@gmail.com> написал(а):
> 
> This issue once reproduced in my productive with similar stack trace. During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445. 
> Accept please jira issue to me and review my PR.
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Dmitry Gorchakov <dm...@gmail.com>.

This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Dmitry Gorchakov <dm...@gmail.com>.

This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Ivan Pavlukhin <vo...@gmail.com>.

I might have missed some context, but a side note:

> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.

From ReentrantReadWriteLock javadoc:
Reentrancy also allows downgrading from the write lock to a read lock,
by acquiring the write lock, then the read lock and then releasing the
write lock. However, upgrading from a read lock to the write lock is
not possible.


Best regards,
Ivan Pavlukhin

вт, 25 февр. 2020 г. в 22:05, Andrey Gura <ag...@apache.org>:
>
> Agree with Andrey and Slava.
>
> Your change could lead to exception in case when current thread does
> not own read lock.
>
> > Checking for a writе lock holding through method isHeldByCurrentThread is not atomic, so there is no certainty that this thread still owns a write lock.
>
> What do you mean when talk "method isHeldByCurrentThread is not
> atomic"? There is no need in any additional guarantees here because
> current thread always sees own changes and any other thread could not
> change thread reference if current thread is owner. Also current
> thread can't check lock owning and release lock concurrently.
>
> > I assume that a read lock was obtained and in this code we did not release it.
>
> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.
>
> Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
> means that JVM detected deadlock but it doesn't mean that exactly this
> threads in deadlock.
>
> On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
> <an...@gmail.com> wrote:
> >
> > Write lock is an exclusive lock and no other threads can obtain read or
> > write lock while current thread holding write lock.
> >
> > Why you think checking for current thread is lock-owner is not atomic?
> > A thread that already holds a write lock has no needs to obtain read locks
> > unless it releases the write-lock.
> >
> > Your change can have serious impact.
> > Can you share a reproducer?
> >
> >
> > вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> > dmitry.gorchakov.murom@gmail.com>:
> >
> > > > Could you please describe in detail the issue you are trying to fix?
> > >
> > > I once got same deadlock on a productive in v2.7.5. New iteration of
> > > checkpoint could not start. Dump attached.
> > >
> > > > By the way, please take a look at
> > > GridCacheDatabaseSharedManager.checkpointReadLock:
> > >
> > > Thanks, I will look more closely at this check.
> > >
> > > > So, do we really need to unconditionally unlock the read lock at
> > > checkpointReadUnlock as you propose?
> > >
> > > Checking for a writе lock holding through method isHeldByCurrentThread is
> > > not atomic, so there is no certainty that this thread still owns a write
> > > lock.
> > > I assume that a read lock was obtained and in this code we did not release
> > > it.
> > >
> > > Thanks!
> > >
> > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <slava.koptilin@gmail.com
> > > >:
> > >
> > >> Hello Dmitry,
> > >>
> > >> To be honest, I don't quite understand the proposed fix. Could you please
> > >> describe in detail the issue you are trying to fix?
> > >> By the way, please take a look at
> > >> GridCacheDatabaseSharedManager.checkpointReadLock:
> > >>
> > >>     @Override public void checkpointReadLock() {
> > >>         if (checkpointLock.writeLock().isHeldByCurrentThread())
> > >>             return;
> > >>         ...
> > >>     }
> > >>
> > >> In other words, the read lock is not acquired in case the write lock is
> > >> already held.
> > >> So, do we really need to unconditionally unlock the read lock at
> > >> checkpointReadUnlock as you propose?
> > >>
> > >> Thanks,
> > >> S.
> > >>
> > >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> > >> dmitry.gorchakov.murom@gmail.com>:
> > >>
> > >> > This issue once reproduced in my productive with similar stack trace.
> > >> > During
> > >> > sources analysis I saw the use unsafe of method
> > >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> > >> > cause
> > >> > thread to starvation. Problem no longer appeared after local fix.
> > >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> > >> > Accept please jira issue to me and review my PR.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >> >
> > >>
> > >

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Andrey Gura <ag...@apache.org>.

Agree with Andrey and Slava.

Your change could lead to exception in case when current thread does
not own read lock.

> Checking for a writе lock holding through method isHeldByCurrentThread is not atomic, so there is no certainty that this thread still owns a write lock.

What do you mean when talk "method isHeldByCurrentThread is not
atomic"? There is no need in any additional guarantees here because
current thread always sees own changes and any other thread could not
change thread reference if current thread is owner. Also current
thread can't check lock owning and release lock concurrently.

> I assume that a read lock was obtained and in this code we did not release it.

If current thread is write lock owner then it can't acquire read lock.
RRWL doesn't allow upgrades or downgrades.

Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
means that JVM detected deadlock but it doesn't mean that exactly this
threads in deadlock.

On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
<an...@gmail.com> wrote:
>
> Write lock is an exclusive lock and no other threads can obtain read or
> write lock while current thread holding write lock.
>
> Why you think checking for current thread is lock-owner is not atomic?
> A thread that already holds a write lock has no needs to obtain read locks
> unless it releases the write-lock.
>
> Your change can have serious impact.
> Can you share a reproducer?
>
>
> вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> dmitry.gorchakov.murom@gmail.com>:
>
> > > Could you please describe in detail the issue you are trying to fix?
> >
> > I once got same deadlock on a productive in v2.7.5. New iteration of
> > checkpoint could not start. Dump attached.
> >
> > > By the way, please take a look at
> > GridCacheDatabaseSharedManager.checkpointReadLock:
> >
> > Thanks, I will look more closely at this check.
> >
> > > So, do we really need to unconditionally unlock the read lock at
> > checkpointReadUnlock as you propose?
> >
> > Checking for a writе lock holding through method isHeldByCurrentThread is
> > not atomic, so there is no certainty that this thread still owns a write
> > lock.
> > I assume that a read lock was obtained and in this code we did not release
> > it.
> >
> > Thanks!
> >
> > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <slava.koptilin@gmail.com
> > >:
> >
> >> Hello Dmitry,
> >>
> >> To be honest, I don't quite understand the proposed fix. Could you please
> >> describe in detail the issue you are trying to fix?
> >> By the way, please take a look at
> >> GridCacheDatabaseSharedManager.checkpointReadLock:
> >>
> >>     @Override public void checkpointReadLock() {
> >>         if (checkpointLock.writeLock().isHeldByCurrentThread())
> >>             return;
> >>         ...
> >>     }
> >>
> >> In other words, the read lock is not acquired in case the write lock is
> >> already held.
> >> So, do we really need to unconditionally unlock the read lock at
> >> checkpointReadUnlock as you propose?
> >>
> >> Thanks,
> >> S.
> >>
> >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> >> dmitry.gorchakov.murom@gmail.com>:
> >>
> >> > This issue once reproduced in my productive with similar stack trace.
> >> > During
> >> > sources analysis I saw the use unsafe of method
> >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> >> > cause
> >> > thread to starvation. Problem no longer appeared after local fix.
> >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> >> > Accept please jira issue to me and review my PR.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >> >
> >>
> >

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Andrey Mashenkov <an...@gmail.com>.

Write lock is an exclusive lock and no other threads can obtain read or
write lock while current thread holding write lock.

Why you think checking for current thread is lock-owner is not atomic?
A thread that already holds a write lock has no needs to obtain read locks
unless it releases the write-lock.

Your change can have serious impact.
Can you share a reproducer?


вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
dmitry.gorchakov.murom@gmail.com>:

> > Could you please describe in detail the issue you are trying to fix?
>
> I once got same deadlock on a productive in v2.7.5. New iteration of
> checkpoint could not start. Dump attached.
>
> > By the way, please take a look at
> GridCacheDatabaseSharedManager.checkpointReadLock:
>
> Thanks, I will look more closely at this check.
>
> > So, do we really need to unconditionally unlock the read lock at
> checkpointReadUnlock as you propose?
>
> Checking for a writе lock holding through method isHeldByCurrentThread is
> not atomic, so there is no certainty that this thread still owns a write
> lock.
> I assume that a read lock was obtained and in this code we did not release
> it.
>
> Thanks!
>
> вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <slava.koptilin@gmail.com
> >:
>
>> Hello Dmitry,
>>
>> To be honest, I don't quite understand the proposed fix. Could you please
>> describe in detail the issue you are trying to fix?
>> By the way, please take a look at
>> GridCacheDatabaseSharedManager.checkpointReadLock:
>>
>>     @Override public void checkpointReadLock() {
>>         if (checkpointLock.writeLock().isHeldByCurrentThread())
>>             return;
>>         ...
>>     }
>>
>> In other words, the read lock is not acquired in case the write lock is
>> already held.
>> So, do we really need to unconditionally unlock the read lock at
>> checkpointReadUnlock as you propose?
>>
>> Thanks,
>> S.
>>
>> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
>> dmitry.gorchakov.murom@gmail.com>:
>>
>> > This issue once reproduced in my productive with similar stack trace.
>> > During
>> > sources analysis I saw the use unsafe of method
>> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
>> > cause
>> > thread to starvation. Problem no longer appeared after local fix.
>> > I prepared pull request https://github.com/apache/ignite/pull/7445.
>> > Accept please jira issue to me and review my PR.
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> >
>>
>

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Дмитрий Горчаков <dm...@gmail.com>.

> Could you please describe in detail the issue you are trying to fix?

I once got same deadlock on a productive in v2.7.5. New iteration of
checkpoint could not start. Dump attached.

> By the way, please take a look at
GridCacheDatabaseSharedManager.checkpointReadLock:

Thanks, I will look more closely at this check.

> So, do we really need to unconditionally unlock the read lock at
checkpointReadUnlock as you propose?

Checking for a writе lock holding through method isHeldByCurrentThread is
not atomic, so there is no certainty that this thread still owns a write
lock.
I assume that a read lock was obtained and in this code we did not release
it.

Thanks!

вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <sl...@gmail.com>:

> Hello Dmitry,
>
> To be honest, I don't quite understand the proposed fix. Could you please
> describe in detail the issue you are trying to fix?
> By the way, please take a look at
> GridCacheDatabaseSharedManager.checkpointReadLock:
>
>     @Override public void checkpointReadLock() {
>         if (checkpointLock.writeLock().isHeldByCurrentThread())
>             return;
>         ...
>     }
>
> In other words, the read lock is not acquired in case the write lock is
> already held.
> So, do we really need to unconditionally unlock the read lock at
> checkpointReadUnlock as you propose?
>
> Thanks,
> S.
>
> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> dmitry.gorchakov.murom@gmail.com>:
>
> > This issue once reproduced in my productive with similar stack trace.
> > During
> > sources analysis I saw the use unsafe of method
> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> > cause
> > thread to starvation. Problem no longer appeared after local fix.
> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> > Accept please jira issue to me and review my PR.
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Вячеслав Коптилин <sl...@gmail.com>.

Hello Dmitry,

To be honest, I don't quite understand the proposed fix. Could you please
describe in detail the issue you are trying to fix?
By the way, please take a look at
GridCacheDatabaseSharedManager.checkpointReadLock:

    @Override public void checkpointReadLock() {
        if (checkpointLock.writeLock().isHeldByCurrentThread())
            return;
        ...
    }

In other words, the read lock is not acquired in case the write lock is
already held.
So, do we really need to unconditionally unlock the read lock at
checkpointReadUnlock as you propose?

Thanks,
S.

вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
dmitry.gorchakov.murom@gmail.com>:

> This issue once reproduced in my productive with similar stack trace.
> During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445.
> Accept please jira issue to me and review my PR.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Posted by Dmitry Gorchakov <dm...@gmail.com>.

This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/