You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Anton Vinogradov <av...@apache.org> on 2019/07/10 13:36:43 UTC

Tx lock partial happens before

Folks,

Investigating now unexpected repairs [1] in case of ReadRepair usage at
testAccountTxNodeRestart.
Updated [2] the test to check is there any repairs happen.
Test's name now is "testAccountTxNodeRestartWithReadRepair".

Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared

Sometime, backup may have obsolete value during such check.

Seems, this happen because tx commit on primary going in the following way
(check code [2] for details):
1) performing localFinish (releases tx lock)
2) performing dhtFinish (commits on backups)
3) transferring control back to the caller

So, seems, the problem here is that "tx lock released on primary" does not
mean that backups updated, but "commit() method finished at caller's
thread" does.
This means that, currently, there is no happens-before between
1) thread 1 committed data on primary and tx lock can be reobtained
2) thread 2 reads from backup
but still strong HB between "commit() finished" and "backup updated"

So, it seems to be possible, for example, to gain notification by a
continuous query, then read from backup and gain obsolete value.

Is this "partial happens before" behavior expected?

[1] https://issues.apache.org/jira/browse/IGNITE-11973
[2] https://github.com/apache/ignite/pull/6679/files
[3]
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx

Re: Tx lock partial happens before

Posted by Павлухин Иван <vo...@gmail.com>.
Anton,

Thank you for clarification.

вт, 16 июл. 2019 г. в 09:24, Anton Vinogradov <av...@apache.org>:
>
> Ivan R.
>
> Thanks.
> I'll try to implement approach you proposed.
>
> Ivan P.
>
> >> what prevents primary partition relocation when
> >> Read Repair is in progress? Is there a transaction or an exlicit lock?
> Did you mean partition eviction?
> RR is almost a regular get with the same logic. It maps on some topology
> and performs regular gets.
> In case node failed or is not an owner anymore we'll just ignore this.
> See the code for details:
>
> if (invalidNodeSet.contains(affNode) || !cctx.discovery().alive(affNode)) {
> onDone(Collections.emptyMap()); // Finishing mini future with just "ok".
>
> On Tue, Jul 16, 2019 at 9:04 AM Павлухин Иван <vo...@gmail.com> wrote:
>
> > Anton,
> >
> > You referenced to failover scenarios. I believe that everything is
> > described in IEP. But to make this discussion self-sufficient could
> > you please outline what prevents primary partition relocation when
> > Read Repair is in progress? Is there a transaction or an exlicit lock?
> >
> > пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <iv...@gmail.com>:
> > >
> > > Anton,
> > >
> > > > Step-by-step:
> > > > 1) primary locked on key mention (get/put) at
> > pessimistic/!read-committed tx
> > > > 2) backups locked on prepare
> > > > 3) primary unlocked on finish
> > > > 4) backups unlocked on finish (after the primary)
> > > > correct?
> > > Yes, this corresponds to my understanding of transactions protocol. With
> > > minor exception: steps 3 and 4 are inverted in case of one-phase commit.
> > >
> > > > Agree, but seems there is no need to acquire the lock, we have just to
> > wait
> > > > until entry becomes unlocked.
> > > > - entry locked means that previous tx's "finish" phase is in progress
> > > > - entry unlocked means reading value is up-to-date (previous "finish"
> > phase
> > > > finished)
> > > > correct?
> > > Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
> > > queue is not empty (first item in queue is actually the transaction that
> > > owns lock).
> > >
> > > > we have just to wait
> > > > until entry becomes unlocked.
> > > This may work.
> > > If consistency checking code has acquired lock on primary, backup can be
> > > in two states:
> > > - not locked - and new locks won't appear as we are holding lock on
> > primary
> > > - still locked by transaction that owned lock on primary just before our
> > > checking code - in such case checking code should just wait for lock
> > release
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 15.07.2019 9:34, Anton Vinogradov wrote:
> > > > Ivan R.
> > > >
> > > > Thanks for joining!
> > > >
> > > > Got an idea, but not sure that got a way of a fix.
> > > >
> > > > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> > > > acquired on backups during the "prepare" phase and released at "finish"
> > > > phase after primary fully committed.
> > > > Step-by-step:
> > > > 1) primary locked on key mention (get/put) at
> > pessimistic/!read-committed tx
> > > > 2) backups locked on prepare
> > > > 3) primary unlocked on finish
> > > > 4) backups unlocked on finish (after the primary)
> > > > correct?
> > > >
> > > > So, acquiring locks on backups, not at the "prepare" phase, may cause
> > > > unexpected behavior in case of primary fail or other errors.
> > > > That's definitely possible to update failover to solve this issue, but
> > it
> > > > seems to be an overcomplicated way.
> > > > The main question there, it there any simple way?
> > > >
> > > >>> checking read from backup will just wait for commit if it's in
> > progress.
> > > > Agree, but seems there is no need to acquire the lock, we have just to
> > wait
> > > > until entry becomes unlocked.
> > > > - entry locked means that previous tx's "finish" phase is in progress
> > > > - entry unlocked means reading value is up-to-date (previous "finish"
> > phase
> > > > finished)
> > > > correct?
> > > >
> > > > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vo...@gmail.com>
> > wrote:
> > > >
> > > >> Anton,
> > > >>
> > > >> I did not know mechanics locking entries on backups during prepare
> > > >> phase. Thank you for pointing that out!
> > > >>
> > > >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
> > > >>> Hi Anton,
> > > >>>
> > > >>>> Each get method now checks the consistency.
> > > >>>> Check means:
> > > >>>> 1) tx lock acquired on primary
> > > >>>> 2) gained data from each owner (primary and backups)
> > > >>>> 3) data compared
> > > >>> Did you consider acquiring locks on backups as well during your
> > check,
> > > >>> just like 2PC prepare does?
> > > >>> If there's HB between steps 1 (lock primary) and 2 (update primary +
> > > >>> lock backup + update backup), you may be sure that there will be no
> > > >>> false-positive results and no deadlocks as well. Protocol won't be
> > > >>> complicated: checking read from backup will just wait for commit if
> > it's
> > > >>> in progress.
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>> On 12.07.2019 9:47, Anton Vinogradov wrote:
> > > >>>> Igniters,
> > > >>>>
> > > >>>> Let me explain problem in detail.
> > > >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync,
> > > >> 2pc)
> > > >>>> able to see consistency violation because backups are not updated
> > yet.
> > > >>>> This seems to be not a good idea to "fix" code to unlock primary
> > only
> > > >> when
> > > >>>> backups updated, this definitely will cause a performance drop.
> > > >>>> Currently, there is no explicit sync feature allows waiting for
> > backups
> > > >>>> updated during the previous tx.
> > > >>>> Previous tx just sends GridNearTxFinishResponse to the originating
> > > >> node.
> > > >>>> Bad ideas how to handle this:
> > > >>>> - retry some times (still possible to gain false positive)
> > > >>>> - lock tx entry on backups (will definitely break failover logic)
> > > >>>> - wait for same entry version on backups during some timeout (will
> > > >> require
> > > >>>> huge changes at "get" logic and false positive still possible)
> > > >>>>
> > > >>>> Is there any simple fix for this issue?
> > > >>>> Thanks for tips in advance.
> > > >>>>
> > > >>>> Ivan,
> > > >>>> thanks for your interest
> > > >>>>
> > > >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> > > >> and
> > > >>>> backup.
> > > >>>> AFAIK, reordering not possible since backups "prepared" before
> > primary
> > > >>>> releases lock.
> > > >>>> So, consistency guaranteed by failover and by "prepare" feature of
> > 2PC.
> > > >>>> Seems, the problem is NOT with consistency at AI, but with
> > consistency
> > > >>>> detection implementation (RR) and possible "false positive" results.
> > > >>>> BTW, checked 1PC case (only one data node at test) and gained no
> > > >> issues.
> > > >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com>
> > > >> wrote:
> > > >>>>> Anton,
> > > >>>>>
> > > >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does
> > not it
> > > >>>>> mean that the things can be even worse and an inconsistent write is
> > > >>>>> possible on a backup? E.g. in scenario:
> > > >>>>> 1. txA writes a value 1 for the key on primary.
> > > >>>>> 2. txA unlocks the key on primary.
> > > >>>>> 3. txA freezes before updating backup.
> > > >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> > and
> > > >>>>> backup.
> > > >>>>> 5. txB wakes up and writes 1 for the key.
> > > >>>>> 6. As result there is 2 on primary and 1 on backup.
> > > >>>>>
> > > >>>>> Naively it seems that locks should be released after all replicas
> > are
> > > >>>>> updated.
> > > >>>>>
> > > >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> > > >>>>>> Folks,
> > > >>>>>>
> > > >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair
> > usage
> > > >> at
> > > >>>>>> testAccountTxNodeRestart.
> > > >>>>>> Updated [2] the test to check is there any repairs happen.
> > > >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> > > >>>>>>
> > > >>>>>> Each get method now checks the consistency.
> > > >>>>>> Check means:
> > > >>>>>> 1) tx lock acquired on primary
> > > >>>>>> 2) gained data from each owner (primary and backups)
> > > >>>>>> 3) data compared
> > > >>>>>>
> > > >>>>>> Sometime, backup may have obsolete value during such check.
> > > >>>>>>
> > > >>>>>> Seems, this happen because tx commit on primary going in the
> > > >> following
> > > >>>>> way
> > > >>>>>> (check code [2] for details):
> > > >>>>>> 1) performing localFinish (releases tx lock)
> > > >>>>>> 2) performing dhtFinish (commits on backups)
> > > >>>>>> 3) transferring control back to the caller
> > > >>>>>>
> > > >>>>>> So, seems, the problem here is that "tx lock released on primary"
> > > >> does
> > > >>>>> not
> > > >>>>>> mean that backups updated, but "commit() method finished at
> > caller's
> > > >>>>>> thread" does.
> > > >>>>>> This means that, currently, there is no happens-before between
> > > >>>>>> 1) thread 1 committed data on primary and tx lock can be
> > reobtained
> > > >>>>>> 2) thread 2 reads from backup
> > > >>>>>> but still strong HB between "commit() finished" and "backup
> > updated"
> > > >>>>>>
> > > >>>>>> So, it seems to be possible, for example, to gain notification by
> > a
> > > >>>>>> continuous query, then read from backup and gain obsolete value.
> > > >>>>>>
> > > >>>>>> Is this "partial happens before" behavior expected?
> > > >>>>>>
> > > >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > >>>>>> [2] https://github.com/apache/ignite/pull/6679/files
> > > >>>>>> [3]
> > > >>>>>>
> > > >>
> > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best regards,
> > > >>>>> Ivan Pavlukhin
> > > >>>>>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >> Ivan Pavlukhin
> > > >>
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Tx lock partial happens before

Posted by Anton Vinogradov <av...@apache.org>.
Ivan R.

Thanks.
I'll try to implement approach you proposed.

Ivan P.

>> what prevents primary partition relocation when
>> Read Repair is in progress? Is there a transaction or an exlicit lock?
Did you mean partition eviction?
RR is almost a regular get with the same logic. It maps on some topology
and performs regular gets.
In case node failed or is not an owner anymore we'll just ignore this.
See the code for details:

if (invalidNodeSet.contains(affNode) || !cctx.discovery().alive(affNode)) {
onDone(Collections.emptyMap()); // Finishing mini future with just "ok".

On Tue, Jul 16, 2019 at 9:04 AM Павлухин Иван <vo...@gmail.com> wrote:

> Anton,
>
> You referenced to failover scenarios. I believe that everything is
> described in IEP. But to make this discussion self-sufficient could
> you please outline what prevents primary partition relocation when
> Read Repair is in progress? Is there a transaction or an exlicit lock?
>
> пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <iv...@gmail.com>:
> >
> > Anton,
> >
> > > Step-by-step:
> > > 1) primary locked on key mention (get/put) at
> pessimistic/!read-committed tx
> > > 2) backups locked on prepare
> > > 3) primary unlocked on finish
> > > 4) backups unlocked on finish (after the primary)
> > > correct?
> > Yes, this corresponds to my understanding of transactions protocol. With
> > minor exception: steps 3 and 4 are inverted in case of one-phase commit.
> >
> > > Agree, but seems there is no need to acquire the lock, we have just to
> wait
> > > until entry becomes unlocked.
> > > - entry locked means that previous tx's "finish" phase is in progress
> > > - entry unlocked means reading value is up-to-date (previous "finish"
> phase
> > > finished)
> > > correct?
> > Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
> > queue is not empty (first item in queue is actually the transaction that
> > owns lock).
> >
> > > we have just to wait
> > > until entry becomes unlocked.
> > This may work.
> > If consistency checking code has acquired lock on primary, backup can be
> > in two states:
> > - not locked - and new locks won't appear as we are holding lock on
> primary
> > - still locked by transaction that owned lock on primary just before our
> > checking code - in such case checking code should just wait for lock
> release
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 15.07.2019 9:34, Anton Vinogradov wrote:
> > > Ivan R.
> > >
> > > Thanks for joining!
> > >
> > > Got an idea, but not sure that got a way of a fix.
> > >
> > > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> > > acquired on backups during the "prepare" phase and released at "finish"
> > > phase after primary fully committed.
> > > Step-by-step:
> > > 1) primary locked on key mention (get/put) at
> pessimistic/!read-committed tx
> > > 2) backups locked on prepare
> > > 3) primary unlocked on finish
> > > 4) backups unlocked on finish (after the primary)
> > > correct?
> > >
> > > So, acquiring locks on backups, not at the "prepare" phase, may cause
> > > unexpected behavior in case of primary fail or other errors.
> > > That's definitely possible to update failover to solve this issue, but
> it
> > > seems to be an overcomplicated way.
> > > The main question there, it there any simple way?
> > >
> > >>> checking read from backup will just wait for commit if it's in
> progress.
> > > Agree, but seems there is no need to acquire the lock, we have just to
> wait
> > > until entry becomes unlocked.
> > > - entry locked means that previous tx's "finish" phase is in progress
> > > - entry unlocked means reading value is up-to-date (previous "finish"
> phase
> > > finished)
> > > correct?
> > >
> > > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vo...@gmail.com>
> wrote:
> > >
> > >> Anton,
> > >>
> > >> I did not know mechanics locking entries on backups during prepare
> > >> phase. Thank you for pointing that out!
> > >>
> > >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
> > >>> Hi Anton,
> > >>>
> > >>>> Each get method now checks the consistency.
> > >>>> Check means:
> > >>>> 1) tx lock acquired on primary
> > >>>> 2) gained data from each owner (primary and backups)
> > >>>> 3) data compared
> > >>> Did you consider acquiring locks on backups as well during your
> check,
> > >>> just like 2PC prepare does?
> > >>> If there's HB between steps 1 (lock primary) and 2 (update primary +
> > >>> lock backup + update backup), you may be sure that there will be no
> > >>> false-positive results and no deadlocks as well. Protocol won't be
> > >>> complicated: checking read from backup will just wait for commit if
> it's
> > >>> in progress.
> > >>>
> > >>> Best Regards,
> > >>> Ivan Rakov
> > >>>
> > >>> On 12.07.2019 9:47, Anton Vinogradov wrote:
> > >>>> Igniters,
> > >>>>
> > >>>> Let me explain problem in detail.
> > >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync,
> > >> 2pc)
> > >>>> able to see consistency violation because backups are not updated
> yet.
> > >>>> This seems to be not a good idea to "fix" code to unlock primary
> only
> > >> when
> > >>>> backups updated, this definitely will cause a performance drop.
> > >>>> Currently, there is no explicit sync feature allows waiting for
> backups
> > >>>> updated during the previous tx.
> > >>>> Previous tx just sends GridNearTxFinishResponse to the originating
> > >> node.
> > >>>> Bad ideas how to handle this:
> > >>>> - retry some times (still possible to gain false positive)
> > >>>> - lock tx entry on backups (will definitely break failover logic)
> > >>>> - wait for same entry version on backups during some timeout (will
> > >> require
> > >>>> huge changes at "get" logic and false positive still possible)
> > >>>>
> > >>>> Is there any simple fix for this issue?
> > >>>> Thanks for tips in advance.
> > >>>>
> > >>>> Ivan,
> > >>>> thanks for your interest
> > >>>>
> > >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> > >> and
> > >>>> backup.
> > >>>> AFAIK, reordering not possible since backups "prepared" before
> primary
> > >>>> releases lock.
> > >>>> So, consistency guaranteed by failover and by "prepare" feature of
> 2PC.
> > >>>> Seems, the problem is NOT with consistency at AI, but with
> consistency
> > >>>> detection implementation (RR) and possible "false positive" results.
> > >>>> BTW, checked 1PC case (only one data node at test) and gained no
> > >> issues.
> > >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com>
> > >> wrote:
> > >>>>> Anton,
> > >>>>>
> > >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does
> not it
> > >>>>> mean that the things can be even worse and an inconsistent write is
> > >>>>> possible on a backup? E.g. in scenario:
> > >>>>> 1. txA writes a value 1 for the key on primary.
> > >>>>> 2. txA unlocks the key on primary.
> > >>>>> 3. txA freezes before updating backup.
> > >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> and
> > >>>>> backup.
> > >>>>> 5. txB wakes up and writes 1 for the key.
> > >>>>> 6. As result there is 2 on primary and 1 on backup.
> > >>>>>
> > >>>>> Naively it seems that locks should be released after all replicas
> are
> > >>>>> updated.
> > >>>>>
> > >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> > >>>>>> Folks,
> > >>>>>>
> > >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair
> usage
> > >> at
> > >>>>>> testAccountTxNodeRestart.
> > >>>>>> Updated [2] the test to check is there any repairs happen.
> > >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> > >>>>>>
> > >>>>>> Each get method now checks the consistency.
> > >>>>>> Check means:
> > >>>>>> 1) tx lock acquired on primary
> > >>>>>> 2) gained data from each owner (primary and backups)
> > >>>>>> 3) data compared
> > >>>>>>
> > >>>>>> Sometime, backup may have obsolete value during such check.
> > >>>>>>
> > >>>>>> Seems, this happen because tx commit on primary going in the
> > >> following
> > >>>>> way
> > >>>>>> (check code [2] for details):
> > >>>>>> 1) performing localFinish (releases tx lock)
> > >>>>>> 2) performing dhtFinish (commits on backups)
> > >>>>>> 3) transferring control back to the caller
> > >>>>>>
> > >>>>>> So, seems, the problem here is that "tx lock released on primary"
> > >> does
> > >>>>> not
> > >>>>>> mean that backups updated, but "commit() method finished at
> caller's
> > >>>>>> thread" does.
> > >>>>>> This means that, currently, there is no happens-before between
> > >>>>>> 1) thread 1 committed data on primary and tx lock can be
> reobtained
> > >>>>>> 2) thread 2 reads from backup
> > >>>>>> but still strong HB between "commit() finished" and "backup
> updated"
> > >>>>>>
> > >>>>>> So, it seems to be possible, for example, to gain notification by
> a
> > >>>>>> continuous query, then read from backup and gain obsolete value.
> > >>>>>>
> > >>>>>> Is this "partial happens before" behavior expected?
> > >>>>>>
> > >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > >>>>>> [2] https://github.com/apache/ignite/pull/6679/files
> > >>>>>> [3]
> > >>>>>>
> > >>
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards,
> > >>>>> Ivan Pavlukhin
> > >>>>>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Ivan Pavlukhin
> > >>
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Tx lock partial happens before

Posted by Павлухин Иван <vo...@gmail.com>.
Anton,

You referenced to failover scenarios. I believe that everything is
described in IEP. But to make this discussion self-sufficient could
you please outline what prevents primary partition relocation when
Read Repair is in progress? Is there a transaction or an exlicit lock?

пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <iv...@gmail.com>:
>
> Anton,
>
> > Step-by-step:
> > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> > 2) backups locked on prepare
> > 3) primary unlocked on finish
> > 4) backups unlocked on finish (after the primary)
> > correct?
> Yes, this corresponds to my understanding of transactions protocol. With
> minor exception: steps 3 and 4 are inverted in case of one-phase commit.
>
> > Agree, but seems there is no need to acquire the lock, we have just to wait
> > until entry becomes unlocked.
> > - entry locked means that previous tx's "finish" phase is in progress
> > - entry unlocked means reading value is up-to-date (previous "finish" phase
> > finished)
> > correct?
> Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
> queue is not empty (first item in queue is actually the transaction that
> owns lock).
>
> > we have just to wait
> > until entry becomes unlocked.
> This may work.
> If consistency checking code has acquired lock on primary, backup can be
> in two states:
> - not locked - and new locks won't appear as we are holding lock on primary
> - still locked by transaction that owned lock on primary just before our
> checking code - in such case checking code should just wait for lock release
>
> Best Regards,
> Ivan Rakov
>
> On 15.07.2019 9:34, Anton Vinogradov wrote:
> > Ivan R.
> >
> > Thanks for joining!
> >
> > Got an idea, but not sure that got a way of a fix.
> >
> > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> > acquired on backups during the "prepare" phase and released at "finish"
> > phase after primary fully committed.
> > Step-by-step:
> > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> > 2) backups locked on prepare
> > 3) primary unlocked on finish
> > 4) backups unlocked on finish (after the primary)
> > correct?
> >
> > So, acquiring locks on backups, not at the "prepare" phase, may cause
> > unexpected behavior in case of primary fail or other errors.
> > That's definitely possible to update failover to solve this issue, but it
> > seems to be an overcomplicated way.
> > The main question there, it there any simple way?
> >
> >>> checking read from backup will just wait for commit if it's in progress.
> > Agree, but seems there is no need to acquire the lock, we have just to wait
> > until entry becomes unlocked.
> > - entry locked means that previous tx's "finish" phase is in progress
> > - entry unlocked means reading value is up-to-date (previous "finish" phase
> > finished)
> > correct?
> >
> > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vo...@gmail.com> wrote:
> >
> >> Anton,
> >>
> >> I did not know mechanics locking entries on backups during prepare
> >> phase. Thank you for pointing that out!
> >>
> >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
> >>> Hi Anton,
> >>>
> >>>> Each get method now checks the consistency.
> >>>> Check means:
> >>>> 1) tx lock acquired on primary
> >>>> 2) gained data from each owner (primary and backups)
> >>>> 3) data compared
> >>> Did you consider acquiring locks on backups as well during your check,
> >>> just like 2PC prepare does?
> >>> If there's HB between steps 1 (lock primary) and 2 (update primary +
> >>> lock backup + update backup), you may be sure that there will be no
> >>> false-positive results and no deadlocks as well. Protocol won't be
> >>> complicated: checking read from backup will just wait for commit if it's
> >>> in progress.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>> On 12.07.2019 9:47, Anton Vinogradov wrote:
> >>>> Igniters,
> >>>>
> >>>> Let me explain problem in detail.
> >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync,
> >> 2pc)
> >>>> able to see consistency violation because backups are not updated yet.
> >>>> This seems to be not a good idea to "fix" code to unlock primary only
> >> when
> >>>> backups updated, this definitely will cause a performance drop.
> >>>> Currently, there is no explicit sync feature allows waiting for backups
> >>>> updated during the previous tx.
> >>>> Previous tx just sends GridNearTxFinishResponse to the originating
> >> node.
> >>>> Bad ideas how to handle this:
> >>>> - retry some times (still possible to gain false positive)
> >>>> - lock tx entry on backups (will definitely break failover logic)
> >>>> - wait for same entry version on backups during some timeout (will
> >> require
> >>>> huge changes at "get" logic and false positive still possible)
> >>>>
> >>>> Is there any simple fix for this issue?
> >>>> Thanks for tips in advance.
> >>>>
> >>>> Ivan,
> >>>> thanks for your interest
> >>>>
> >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> >> and
> >>>> backup.
> >>>> AFAIK, reordering not possible since backups "prepared" before primary
> >>>> releases lock.
> >>>> So, consistency guaranteed by failover and by "prepare" feature of 2PC.
> >>>> Seems, the problem is NOT with consistency at AI, but with consistency
> >>>> detection implementation (RR) and possible "false positive" results.
> >>>> BTW, checked 1PC case (only one data node at test) and gained no
> >> issues.
> >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com>
> >> wrote:
> >>>>> Anton,
> >>>>>
> >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> >>>>> mean that the things can be even worse and an inconsistent write is
> >>>>> possible on a backup? E.g. in scenario:
> >>>>> 1. txA writes a value 1 for the key on primary.
> >>>>> 2. txA unlocks the key on primary.
> >>>>> 3. txA freezes before updating backup.
> >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> >>>>> backup.
> >>>>> 5. txB wakes up and writes 1 for the key.
> >>>>> 6. As result there is 2 on primary and 1 on backup.
> >>>>>
> >>>>> Naively it seems that locks should be released after all replicas are
> >>>>> updated.
> >>>>>
> >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> >>>>>> Folks,
> >>>>>>
> >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair usage
> >> at
> >>>>>> testAccountTxNodeRestart.
> >>>>>> Updated [2] the test to check is there any repairs happen.
> >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> >>>>>>
> >>>>>> Each get method now checks the consistency.
> >>>>>> Check means:
> >>>>>> 1) tx lock acquired on primary
> >>>>>> 2) gained data from each owner (primary and backups)
> >>>>>> 3) data compared
> >>>>>>
> >>>>>> Sometime, backup may have obsolete value during such check.
> >>>>>>
> >>>>>> Seems, this happen because tx commit on primary going in the
> >> following
> >>>>> way
> >>>>>> (check code [2] for details):
> >>>>>> 1) performing localFinish (releases tx lock)
> >>>>>> 2) performing dhtFinish (commits on backups)
> >>>>>> 3) transferring control back to the caller
> >>>>>>
> >>>>>> So, seems, the problem here is that "tx lock released on primary"
> >> does
> >>>>> not
> >>>>>> mean that backups updated, but "commit() method finished at caller's
> >>>>>> thread" does.
> >>>>>> This means that, currently, there is no happens-before between
> >>>>>> 1) thread 1 committed data on primary and tx lock can be reobtained
> >>>>>> 2) thread 2 reads from backup
> >>>>>> but still strong HB between "commit() finished" and "backup updated"
> >>>>>>
> >>>>>> So, it seems to be possible, for example, to gain notification by a
> >>>>>> continuous query, then read from backup and gain obsolete value.
> >>>>>>
> >>>>>> Is this "partial happens before" behavior expected?
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> >>>>>> [2] https://github.com/apache/ignite/pull/6679/files
> >>>>>> [3]
> >>>>>>
> >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan Pavlukhin
> >>>>>
> >>
> >>
> >> --
> >> Best regards,
> >> Ivan Pavlukhin
> >>



-- 
Best regards,
Ivan Pavlukhin

Re: Tx lock partial happens before

Posted by Ivan Rakov <iv...@gmail.com>.
Anton,

> Step-by-step:
> 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> 2) backups locked on prepare
> 3) primary unlocked on finish
> 4) backups unlocked on finish (after the primary)
> correct?
Yes, this corresponds to my understanding of transactions protocol. With 
minor exception: steps 3 and 4 are inverted in case of one-phase commit.

> Agree, but seems there is no need to acquire the lock, we have just to wait
> until entry becomes unlocked.
> - entry locked means that previous tx's "finish" phase is in progress
> - entry unlocked means reading value is up-to-date (previous "finish" phase
> finished)
> correct?
Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates 
queue is not empty (first item in queue is actually the transaction that 
owns lock).

> we have just to wait
> until entry becomes unlocked.
This may work.
If consistency checking code has acquired lock on primary, backup can be 
in two states:
- not locked - and new locks won't appear as we are holding lock on primary
- still locked by transaction that owned lock on primary just before our 
checking code - in such case checking code should just wait for lock release

Best Regards,
Ivan Rakov

On 15.07.2019 9:34, Anton Vinogradov wrote:
> Ivan R.
>
> Thanks for joining!
>
> Got an idea, but not sure that got a way of a fix.
>
> AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> acquired on backups during the "prepare" phase and released at "finish"
> phase after primary fully committed.
> Step-by-step:
> 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> 2) backups locked on prepare
> 3) primary unlocked on finish
> 4) backups unlocked on finish (after the primary)
> correct?
>
> So, acquiring locks on backups, not at the "prepare" phase, may cause
> unexpected behavior in case of primary fail or other errors.
> That's definitely possible to update failover to solve this issue, but it
> seems to be an overcomplicated way.
> The main question there, it there any simple way?
>
>>> checking read from backup will just wait for commit if it's in progress.
> Agree, but seems there is no need to acquire the lock, we have just to wait
> until entry becomes unlocked.
> - entry locked means that previous tx's "finish" phase is in progress
> - entry unlocked means reading value is up-to-date (previous "finish" phase
> finished)
> correct?
>
> On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vo...@gmail.com> wrote:
>
>> Anton,
>>
>> I did not know mechanics locking entries on backups during prepare
>> phase. Thank you for pointing that out!
>>
>> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
>>> Hi Anton,
>>>
>>>> Each get method now checks the consistency.
>>>> Check means:
>>>> 1) tx lock acquired on primary
>>>> 2) gained data from each owner (primary and backups)
>>>> 3) data compared
>>> Did you consider acquiring locks on backups as well during your check,
>>> just like 2PC prepare does?
>>> If there's HB between steps 1 (lock primary) and 2 (update primary +
>>> lock backup + update backup), you may be sure that there will be no
>>> false-positive results and no deadlocks as well. Protocol won't be
>>> complicated: checking read from backup will just wait for commit if it's
>>> in progress.
>>>
>>> Best Regards,
>>> Ivan Rakov
>>>
>>> On 12.07.2019 9:47, Anton Vinogradov wrote:
>>>> Igniters,
>>>>
>>>> Let me explain problem in detail.
>>>> Read Repair at pessimistic tx (locks acquired on primary, full sync,
>> 2pc)
>>>> able to see consistency violation because backups are not updated yet.
>>>> This seems to be not a good idea to "fix" code to unlock primary only
>> when
>>>> backups updated, this definitely will cause a performance drop.
>>>> Currently, there is no explicit sync feature allows waiting for backups
>>>> updated during the previous tx.
>>>> Previous tx just sends GridNearTxFinishResponse to the originating
>> node.
>>>> Bad ideas how to handle this:
>>>> - retry some times (still possible to gain false positive)
>>>> - lock tx entry on backups (will definitely break failover logic)
>>>> - wait for same entry version on backups during some timeout (will
>> require
>>>> huge changes at "get" logic and false positive still possible)
>>>>
>>>> Is there any simple fix for this issue?
>>>> Thanks for tips in advance.
>>>>
>>>> Ivan,
>>>> thanks for your interest
>>>>
>>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
>> and
>>>> backup.
>>>> AFAIK, reordering not possible since backups "prepared" before primary
>>>> releases lock.
>>>> So, consistency guaranteed by failover and by "prepare" feature of 2PC.
>>>> Seems, the problem is NOT with consistency at AI, but with consistency
>>>> detection implementation (RR) and possible "false positive" results.
>>>> BTW, checked 1PC case (only one data node at test) and gained no
>> issues.
>>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com>
>> wrote:
>>>>> Anton,
>>>>>
>>>>> Is such behavior observed for 2PC or for 1PC optimization? Does not it
>>>>> mean that the things can be even worse and an inconsistent write is
>>>>> possible on a backup? E.g. in scenario:
>>>>> 1. txA writes a value 1 for the key on primary.
>>>>> 2. txA unlocks the key on primary.
>>>>> 3. txA freezes before updating backup.
>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
>>>>> backup.
>>>>> 5. txB wakes up and writes 1 for the key.
>>>>> 6. As result there is 2 on primary and 1 on backup.
>>>>>
>>>>> Naively it seems that locks should be released after all replicas are
>>>>> updated.
>>>>>
>>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
>>>>>> Folks,
>>>>>>
>>>>>> Investigating now unexpected repairs [1] in case of ReadRepair usage
>> at
>>>>>> testAccountTxNodeRestart.
>>>>>> Updated [2] the test to check is there any repairs happen.
>>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
>>>>>>
>>>>>> Each get method now checks the consistency.
>>>>>> Check means:
>>>>>> 1) tx lock acquired on primary
>>>>>> 2) gained data from each owner (primary and backups)
>>>>>> 3) data compared
>>>>>>
>>>>>> Sometime, backup may have obsolete value during such check.
>>>>>>
>>>>>> Seems, this happen because tx commit on primary going in the
>> following
>>>>> way
>>>>>> (check code [2] for details):
>>>>>> 1) performing localFinish (releases tx lock)
>>>>>> 2) performing dhtFinish (commits on backups)
>>>>>> 3) transferring control back to the caller
>>>>>>
>>>>>> So, seems, the problem here is that "tx lock released on primary"
>> does
>>>>> not
>>>>>> mean that backups updated, but "commit() method finished at caller's
>>>>>> thread" does.
>>>>>> This means that, currently, there is no happens-before between
>>>>>> 1) thread 1 committed data on primary and tx lock can be reobtained
>>>>>> 2) thread 2 reads from backup
>>>>>> but still strong HB between "commit() finished" and "backup updated"
>>>>>>
>>>>>> So, it seems to be possible, for example, to gain notification by a
>>>>>> continuous query, then read from backup and gain obsolete value.
>>>>>>
>>>>>> Is this "partial happens before" behavior expected?
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
>>>>>> [2] https://github.com/apache/ignite/pull/6679/files
>>>>>> [3]
>>>>>>
>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Ivan Pavlukhin
>>>>>
>>
>>
>> --
>> Best regards,
>> Ivan Pavlukhin
>>

Re: Tx lock partial happens before

Posted by Anton Vinogradov <av...@apache.org>.
Ivan R.

Thanks for joining!

Got an idea, but not sure that got a way of a fix.

AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
acquired on backups during the "prepare" phase and released at "finish"
phase after primary fully committed.
Step-by-step:
1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
2) backups locked on prepare
3) primary unlocked on finish
4) backups unlocked on finish (after the primary)
correct?

So, acquiring locks on backups, not at the "prepare" phase, may cause
unexpected behavior in case of primary fail or other errors.
That's definitely possible to update failover to solve this issue, but it
seems to be an overcomplicated way.
The main question there, it there any simple way?

>> checking read from backup will just wait for commit if it's in progress.
Agree, but seems there is no need to acquire the lock, we have just to wait
until entry becomes unlocked.
- entry locked means that previous tx's "finish" phase is in progress
- entry unlocked means reading value is up-to-date (previous "finish" phase
finished)
correct?

On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vo...@gmail.com> wrote:

> Anton,
>
> I did not know mechanics locking entries on backups during prepare
> phase. Thank you for pointing that out!
>
> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
> >
> > Hi Anton,
> >
> > > Each get method now checks the consistency.
> > > Check means:
> > > 1) tx lock acquired on primary
> > > 2) gained data from each owner (primary and backups)
> > > 3) data compared
> > Did you consider acquiring locks on backups as well during your check,
> > just like 2PC prepare does?
> > If there's HB between steps 1 (lock primary) and 2 (update primary +
> > lock backup + update backup), you may be sure that there will be no
> > false-positive results and no deadlocks as well. Protocol won't be
> > complicated: checking read from backup will just wait for commit if it's
> > in progress.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 12.07.2019 9:47, Anton Vinogradov wrote:
> > > Igniters,
> > >
> > > Let me explain problem in detail.
> > > Read Repair at pessimistic tx (locks acquired on primary, full sync,
> 2pc)
> > > able to see consistency violation because backups are not updated yet.
> > > This seems to be not a good idea to "fix" code to unlock primary only
> when
> > > backups updated, this definitely will cause a performance drop.
> > > Currently, there is no explicit sync feature allows waiting for backups
> > > updated during the previous tx.
> > > Previous tx just sends GridNearTxFinishResponse to the originating
> node.
> > >
> > > Bad ideas how to handle this:
> > > - retry some times (still possible to gain false positive)
> > > - lock tx entry on backups (will definitely break failover logic)
> > > - wait for same entry version on backups during some timeout (will
> require
> > > huge changes at "get" logic and false positive still possible)
> > >
> > > Is there any simple fix for this issue?
> > > Thanks for tips in advance.
> > >
> > > Ivan,
> > > thanks for your interest
> > >
> > >>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> and
> > > backup.
> > > AFAIK, reordering not possible since backups "prepared" before primary
> > > releases lock.
> > > So, consistency guaranteed by failover and by "prepare" feature of 2PC.
> > > Seems, the problem is NOT with consistency at AI, but with consistency
> > > detection implementation (RR) and possible "false positive" results.
> > > BTW, checked 1PC case (only one data node at test) and gained no
> issues.
> > >
> > > On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com>
> wrote:
> > >
> > >> Anton,
> > >>
> > >> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> > >> mean that the things can be even worse and an inconsistent write is
> > >> possible on a backup? E.g. in scenario:
> > >> 1. txA writes a value 1 for the key on primary.
> > >> 2. txA unlocks the key on primary.
> > >> 3. txA freezes before updating backup.
> > >> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> > >> backup.
> > >> 5. txB wakes up and writes 1 for the key.
> > >> 6. As result there is 2 on primary and 1 on backup.
> > >>
> > >> Naively it seems that locks should be released after all replicas are
> > >> updated.
> > >>
> > >> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> > >>> Folks,
> > >>>
> > >>> Investigating now unexpected repairs [1] in case of ReadRepair usage
> at
> > >>> testAccountTxNodeRestart.
> > >>> Updated [2] the test to check is there any repairs happen.
> > >>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> > >>>
> > >>> Each get method now checks the consistency.
> > >>> Check means:
> > >>> 1) tx lock acquired on primary
> > >>> 2) gained data from each owner (primary and backups)
> > >>> 3) data compared
> > >>>
> > >>> Sometime, backup may have obsolete value during such check.
> > >>>
> > >>> Seems, this happen because tx commit on primary going in the
> following
> > >> way
> > >>> (check code [2] for details):
> > >>> 1) performing localFinish (releases tx lock)
> > >>> 2) performing dhtFinish (commits on backups)
> > >>> 3) transferring control back to the caller
> > >>>
> > >>> So, seems, the problem here is that "tx lock released on primary"
> does
> > >> not
> > >>> mean that backups updated, but "commit() method finished at caller's
> > >>> thread" does.
> > >>> This means that, currently, there is no happens-before between
> > >>> 1) thread 1 committed data on primary and tx lock can be reobtained
> > >>> 2) thread 2 reads from backup
> > >>> but still strong HB between "commit() finished" and "backup updated"
> > >>>
> > >>> So, it seems to be possible, for example, to gain notification by a
> > >>> continuous query, then read from backup and gain obsolete value.
> > >>>
> > >>> Is this "partial happens before" behavior expected?
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > >>> [2] https://github.com/apache/ignite/pull/6679/files
> > >>> [3]
> > >>>
> > >>
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Ivan Pavlukhin
> > >>
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Tx lock partial happens before

Posted by Павлухин Иван <vo...@gmail.com>.
Anton,

I did not know mechanics locking entries on backups during prepare
phase. Thank you for pointing that out!

пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <iv...@gmail.com>:
>
> Hi Anton,
>
> > Each get method now checks the consistency.
> > Check means:
> > 1) tx lock acquired on primary
> > 2) gained data from each owner (primary and backups)
> > 3) data compared
> Did you consider acquiring locks on backups as well during your check,
> just like 2PC prepare does?
> If there's HB between steps 1 (lock primary) and 2 (update primary +
> lock backup + update backup), you may be sure that there will be no
> false-positive results and no deadlocks as well. Protocol won't be
> complicated: checking read from backup will just wait for commit if it's
> in progress.
>
> Best Regards,
> Ivan Rakov
>
> On 12.07.2019 9:47, Anton Vinogradov wrote:
> > Igniters,
> >
> > Let me explain problem in detail.
> > Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
> > able to see consistency violation because backups are not updated yet.
> > This seems to be not a good idea to "fix" code to unlock primary only when
> > backups updated, this definitely will cause a performance drop.
> > Currently, there is no explicit sync feature allows waiting for backups
> > updated during the previous tx.
> > Previous tx just sends GridNearTxFinishResponse to the originating node.
> >
> > Bad ideas how to handle this:
> > - retry some times (still possible to gain false positive)
> > - lock tx entry on backups (will definitely break failover logic)
> > - wait for same entry version on backups during some timeout (will require
> > huge changes at "get" logic and false positive still possible)
> >
> > Is there any simple fix for this issue?
> > Thanks for tips in advance.
> >
> > Ivan,
> > thanks for your interest
> >
> >>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> > backup.
> > AFAIK, reordering not possible since backups "prepared" before primary
> > releases lock.
> > So, consistency guaranteed by failover and by "prepare" feature of 2PC.
> > Seems, the problem is NOT with consistency at AI, but with consistency
> > detection implementation (RR) and possible "false positive" results.
> > BTW, checked 1PC case (only one data node at test) and gained no issues.
> >
> > On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com> wrote:
> >
> >> Anton,
> >>
> >> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> >> mean that the things can be even worse and an inconsistent write is
> >> possible on a backup? E.g. in scenario:
> >> 1. txA writes a value 1 for the key on primary.
> >> 2. txA unlocks the key on primary.
> >> 3. txA freezes before updating backup.
> >> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> >> backup.
> >> 5. txB wakes up and writes 1 for the key.
> >> 6. As result there is 2 on primary and 1 on backup.
> >>
> >> Naively it seems that locks should be released after all replicas are
> >> updated.
> >>
> >> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> >>> Folks,
> >>>
> >>> Investigating now unexpected repairs [1] in case of ReadRepair usage at
> >>> testAccountTxNodeRestart.
> >>> Updated [2] the test to check is there any repairs happen.
> >>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> >>>
> >>> Each get method now checks the consistency.
> >>> Check means:
> >>> 1) tx lock acquired on primary
> >>> 2) gained data from each owner (primary and backups)
> >>> 3) data compared
> >>>
> >>> Sometime, backup may have obsolete value during such check.
> >>>
> >>> Seems, this happen because tx commit on primary going in the following
> >> way
> >>> (check code [2] for details):
> >>> 1) performing localFinish (releases tx lock)
> >>> 2) performing dhtFinish (commits on backups)
> >>> 3) transferring control back to the caller
> >>>
> >>> So, seems, the problem here is that "tx lock released on primary" does
> >> not
> >>> mean that backups updated, but "commit() method finished at caller's
> >>> thread" does.
> >>> This means that, currently, there is no happens-before between
> >>> 1) thread 1 committed data on primary and tx lock can be reobtained
> >>> 2) thread 2 reads from backup
> >>> but still strong HB between "commit() finished" and "backup updated"
> >>>
> >>> So, it seems to be possible, for example, to gain notification by a
> >>> continuous query, then read from backup and gain obsolete value.
> >>>
> >>> Is this "partial happens before" behavior expected?
> >>>
> >>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> >>> [2] https://github.com/apache/ignite/pull/6679/files
> >>> [3]
> >>>
> >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Ivan Pavlukhin
> >>



-- 
Best regards,
Ivan Pavlukhin

Re: Tx lock partial happens before

Posted by Ivan Rakov <iv...@gmail.com>.
Hi Anton,

> Each get method now checks the consistency.
> Check means:
> 1) tx lock acquired on primary
> 2) gained data from each owner (primary and backups)
> 3) data compared
Did you consider acquiring locks on backups as well during your check, 
just like 2PC prepare does?
If there's HB between steps 1 (lock primary) and 2 (update primary + 
lock backup + update backup), you may be sure that there will be no 
false-positive results and no deadlocks as well. Protocol won't be 
complicated: checking read from backup will just wait for commit if it's 
in progress.

Best Regards,
Ivan Rakov

On 12.07.2019 9:47, Anton Vinogradov wrote:
> Igniters,
>
> Let me explain problem in detail.
> Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
> able to see consistency violation because backups are not updated yet.
> This seems to be not a good idea to "fix" code to unlock primary only when
> backups updated, this definitely will cause a performance drop.
> Currently, there is no explicit sync feature allows waiting for backups
> updated during the previous tx.
> Previous tx just sends GridNearTxFinishResponse to the originating node.
>
> Bad ideas how to handle this:
> - retry some times (still possible to gain false positive)
> - lock tx entry on backups (will definitely break failover logic)
> - wait for same entry version on backups during some timeout (will require
> huge changes at "get" logic and false positive still possible)
>
> Is there any simple fix for this issue?
> Thanks for tips in advance.
>
> Ivan,
> thanks for your interest
>
>>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> backup.
> AFAIK, reordering not possible since backups "prepared" before primary
> releases lock.
> So, consistency guaranteed by failover and by "prepare" feature of 2PC.
> Seems, the problem is NOT with consistency at AI, but with consistency
> detection implementation (RR) and possible "false positive" results.
> BTW, checked 1PC case (only one data node at test) and gained no issues.
>
> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com> wrote:
>
>> Anton,
>>
>> Is such behavior observed for 2PC or for 1PC optimization? Does not it
>> mean that the things can be even worse and an inconsistent write is
>> possible on a backup? E.g. in scenario:
>> 1. txA writes a value 1 for the key on primary.
>> 2. txA unlocks the key on primary.
>> 3. txA freezes before updating backup.
>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
>> backup.
>> 5. txB wakes up and writes 1 for the key.
>> 6. As result there is 2 on primary and 1 on backup.
>>
>> Naively it seems that locks should be released after all replicas are
>> updated.
>>
>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
>>> Folks,
>>>
>>> Investigating now unexpected repairs [1] in case of ReadRepair usage at
>>> testAccountTxNodeRestart.
>>> Updated [2] the test to check is there any repairs happen.
>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
>>>
>>> Each get method now checks the consistency.
>>> Check means:
>>> 1) tx lock acquired on primary
>>> 2) gained data from each owner (primary and backups)
>>> 3) data compared
>>>
>>> Sometime, backup may have obsolete value during such check.
>>>
>>> Seems, this happen because tx commit on primary going in the following
>> way
>>> (check code [2] for details):
>>> 1) performing localFinish (releases tx lock)
>>> 2) performing dhtFinish (commits on backups)
>>> 3) transferring control back to the caller
>>>
>>> So, seems, the problem here is that "tx lock released on primary" does
>> not
>>> mean that backups updated, but "commit() method finished at caller's
>>> thread" does.
>>> This means that, currently, there is no happens-before between
>>> 1) thread 1 committed data on primary and tx lock can be reobtained
>>> 2) thread 2 reads from backup
>>> but still strong HB between "commit() finished" and "backup updated"
>>>
>>> So, it seems to be possible, for example, to gain notification by a
>>> continuous query, then read from backup and gain obsolete value.
>>>
>>> Is this "partial happens before" behavior expected?
>>>
>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
>>> [2] https://github.com/apache/ignite/pull/6679/files
>>> [3]
>>>
>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
>>
>>
>>
>> --
>> Best regards,
>> Ivan Pavlukhin
>>

Re: Tx lock partial happens before

Posted by Anton Vinogradov <av...@apache.org>.
Igniters,

Let me explain problem in detail.
Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
able to see consistency violation because backups are not updated yet.
This seems to be not a good idea to "fix" code to unlock primary only when
backups updated, this definitely will cause a performance drop.
Currently, there is no explicit sync feature allows waiting for backups
updated during the previous tx.
Previous tx just sends GridNearTxFinishResponse to the originating node.

Bad ideas how to handle this:
- retry some times (still possible to gain false positive)
- lock tx entry on backups (will definitely break failover logic)
- wait for same entry version on backups during some timeout (will require
huge changes at "get" logic and false positive still possible)

Is there any simple fix for this issue?
Thanks for tips in advance.

Ivan,
thanks for your interest

>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
AFAIK, reordering not possible since backups "prepared" before primary
releases lock.
So, consistency guaranteed by failover and by "prepare" feature of 2PC.
Seems, the problem is NOT with consistency at AI, but with consistency
detection implementation (RR) and possible "false positive" results.
BTW, checked 1PC case (only one data node at test) and gained no issues.

On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vo...@gmail.com> wrote:

> Anton,
>
> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> mean that the things can be even worse and an inconsistent write is
> possible on a backup? E.g. in scenario:
> 1. txA writes a value 1 for the key on primary.
> 2. txA unlocks the key on primary.
> 3. txA freezes before updating backup.
> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> backup.
> 5. txB wakes up and writes 1 for the key.
> 6. As result there is 2 on primary and 1 on backup.
>
> Naively it seems that locks should be released after all replicas are
> updated.
>
> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
> >
> > Folks,
> >
> > Investigating now unexpected repairs [1] in case of ReadRepair usage at
> > testAccountTxNodeRestart.
> > Updated [2] the test to check is there any repairs happen.
> > Test's name now is "testAccountTxNodeRestartWithReadRepair".
> >
> > Each get method now checks the consistency.
> > Check means:
> > 1) tx lock acquired on primary
> > 2) gained data from each owner (primary and backups)
> > 3) data compared
> >
> > Sometime, backup may have obsolete value during such check.
> >
> > Seems, this happen because tx commit on primary going in the following
> way
> > (check code [2] for details):
> > 1) performing localFinish (releases tx lock)
> > 2) performing dhtFinish (commits on backups)
> > 3) transferring control back to the caller
> >
> > So, seems, the problem here is that "tx lock released on primary" does
> not
> > mean that backups updated, but "commit() method finished at caller's
> > thread" does.
> > This means that, currently, there is no happens-before between
> > 1) thread 1 committed data on primary and tx lock can be reobtained
> > 2) thread 2 reads from backup
> > but still strong HB between "commit() finished" and "backup updated"
> >
> > So, it seems to be possible, for example, to gain notification by a
> > continuous query, then read from backup and gain obsolete value.
> >
> > Is this "partial happens before" behavior expected?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > [2] https://github.com/apache/ignite/pull/6679/files
> > [3]
> >
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Tx lock partial happens before

Posted by Павлухин Иван <vo...@gmail.com>.
Anton,

Is such behavior observed for 2PC or for 1PC optimization? Does not it
mean that the things can be even worse and an inconsistent write is
possible on a backup? E.g. in scenario:
1. txA writes a value 1 for the key on primary.
2. txA unlocks the key on primary.
3. txA freezes before updating backup.
4. Very fast and lucky txB writes a value 2 for the key on primary and backup.
5. txB wakes up and writes 1 for the key.
6. As result there is 2 on primary and 1 on backup.

Naively it seems that locks should be released after all replicas are updated.

ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <av...@apache.org>:
>
> Folks,
>
> Investigating now unexpected repairs [1] in case of ReadRepair usage at
> testAccountTxNodeRestart.
> Updated [2] the test to check is there any repairs happen.
> Test's name now is "testAccountTxNodeRestartWithReadRepair".
>
> Each get method now checks the consistency.
> Check means:
> 1) tx lock acquired on primary
> 2) gained data from each owner (primary and backups)
> 3) data compared
>
> Sometime, backup may have obsolete value during such check.
>
> Seems, this happen because tx commit on primary going in the following way
> (check code [2] for details):
> 1) performing localFinish (releases tx lock)
> 2) performing dhtFinish (commits on backups)
> 3) transferring control back to the caller
>
> So, seems, the problem here is that "tx lock released on primary" does not
> mean that backups updated, but "commit() method finished at caller's
> thread" does.
> This means that, currently, there is no happens-before between
> 1) thread 1 committed data on primary and tx lock can be reobtained
> 2) thread 2 reads from backup
> but still strong HB between "commit() finished" and "backup updated"
>
> So, it seems to be possible, for example, to gain notification by a
> continuous query, then read from backup and gain obsolete value.
>
> Is this "partial happens before" behavior expected?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> [2] https://github.com/apache/ignite/pull/6679/files
> [3]
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx



-- 
Best regards,
Ivan Pavlukhin