You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Aleksey Kuznetsov <al...@gmail.com> on 2018/02/26 10:59:36 UTC

[IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Hello Igniters!

Currently we have suspension/resuming implemented for optimistic
transactions [1].
Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
supported [4].

I’m working on a ticket "Suspension/resuming for pessimistic transactions"
[2].
Goal of the ticket is to support transaction suspend/resume for pessimistic
transactions.

# Benefits of suspending/resuming transaction.

  1. Full JTA standart support.
  2. Increase of throughput in high-load scenarios.
      Suspend operation would allow to release Ignite threads and
optionally perform some other logic.

Note, current API has got suspend/resume only for optimistic transactions,
which confuses users.

# Real life example.

Consider the following scenario:

  1. Application starts Ignite transaction.
  2. Business logic is executed inside transaction.
  3. For commit/rollback application need approval message from external
agent.
  4. Currently, thread inside Ignite is idle until approval is received.
  4a. When suspend/resume support is implemented, application can perform
suspend and release thread inside Ignite.

# How pessimistic transaction works.

When we perform put/get operations in pessimistic transactions, lock
request is sent to remote nodes by `GridNearLockRequest`.
Request contains thread id `IgniteTxAdapter#threadId`, in which operation
was performed.
In pessimistic mode, multiple transaction objects are created - on
primary, on backup nodes, and on originating node:
`GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`, `GridDhtTxRemote`.

Thread id is used in logic on these nodes.
For instance, to check whether thread has successfully locked the key,
after lock acquisition attempt.
Or to check whether active transaction exists.

# Main challenge for implementation.

I've analysed implementation approaches and see the core issue:

The essential problem with suspending/resuming lies in thread id field
transferred to remote nodes during put/get operations.

Imagine, we want to suspend transaction and resume it in another thread.
See code snippet below:

```
tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);

cache.put(1, 1);

tx.suspend();
....

// In another thread.
tx.resume(); // Thread id will be changed in transaction instance.
```

Original thread id is transferred and saved on remote node.
After resuming thread id on local node differs from remote node.
I want to avoid one more network round trip to change thread Id on remote
node after transaction resuming.

# Design proposal.

Transaction id (`xid`) can be used instead of thread id on remote nodes.
The following solution is possible for the ticket :

Replace thread id by transaction id for sending to remote nodes.
Thread id will be removed from the following classes:
`IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
`GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
`GridDistributedTxPrepareRequest`.

I haven't removed thread id completely from code. Thread id is moved to
`GridNearTxLocal`.
We still need it in near local transaction for many reasons, for example to
assure only thread started transaction can suspend it in
`GridNearTxLocal#suspend()`.
In future we can remove thread id completely. I propose to study this
question in another ticket.

Also, thread id is remained in `GridDistributedLockRequest`.
Lock request used by cache locks and it need to transfer thread id to
remote nodes.
For example to use cache locks along with cache operations put/get, see
`GridNearTxLocal#updateExplicitVersion`.
As for pessimistic transaction, thread id in `GridDistributedLockRequest`
is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.

Note, that if user suspends transaction and forgets to resume it,
transaction would be rolled back once timeout has occurred.

In my design when transaction is suspended, all locked keys remain locked.

Please see my prototype of proposal implementation [3].

Proposed changes are relatively small.
They ensure consistency of information about locks, if thread Id will be
changed within one transaction (by suspend/resume).
There will be used correct id for locks on remote nodes. It also requires
painstaking work, but changes will not affect the logic of oher components.

Tell me please what do you think? Any suggestions and comments will be
helpful.

If you agree with my design I also will do benchmarking.

[1] https://issues.apache.org/jira/browse/IGNITE-5712
[2] https://issues.apache.org/jira/browse/IGNITE-5714
[3] https://github.com/apache/ignite/pull/2789
[4] Section 3.2.3
http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
-- 

*Best Regards,*

*Kuznetsov Aleksey*

Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Posted by Кузнецов Алексей Львович <al...@gmail.com>.
PR for this issue is ready to be reviewed.
Changes are relatively small.

1. No network messages have been changed.
2. Thread id usage replaced with xid usage when we work with candidates.
3. All current tests are passed [3]

Please, review. [1], [2]

[1] https://issues.apache.org/jira/browse/IGNITE-5714
[2] https://github.com/apache/ignite/pull/2789
[3] 
https://ci.ignite.apache.org/viewLog.html?buildId=1229177&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll

28.02.2018 19:10, Aleksey Kuznetsov пишет:
> > What if originating(or primary, or couple of primaries) node(s) 
> fails when tx is suspended?
>
> Thanks for advice, I've added a couple of failover tests. With the 
> following scenarios :
>
> Started transaction, perform some put, suspend it and primary node 
> fails. Works as expected : tx rollbacks.
> In another scenario, the originating node fails. Works as expected : 
> tx rollbacks.
>
> > Do we have tests for this feature?
>
> yeah,we do.
>
> ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <nizhikov@apache.org 
> <ma...@apache.org>>:
>
>     Hello, Aleksey.
>
>     > Tests were added for suspending/resuming optimistic transaction
>     earlier for parent task.
>     > I adopted them for pessimistic mode. Also tx rollback tests were
>     aded.
>
>     I think, at least, we need new failover tests.
>
>     What if originating(or primary, or couple of primaries) node(s)
>     fails when tx is suspended?
>
>     > Note, that if user suspends transaction and forgets to resume it,
>     > transaction would be rolled back once timeout has occurred.
>
>     Do we have tests for this feature?
>
>     В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
>     > Thanx for the answer!
>     >
>     > > Can we remove thread id completely from code?
>     > > Can you estimate how much effort do we need?
>     >
>     > I haven't removed thread id completely from the code because we
>     tightly
>     > coupled to it in some parts.
>     > For instance, look at the following scenario :
>     >
>     > "When we suspend transaction, we must make sure only thread started
>     > transaction can suspend it. So we compare thread id in
>     > `GridNearTxLocal#suspend()`."
>     >
>     > Note that cache locks also make use of thread id, consider the
>     following
>     > scenario:
>     >
>     > "We create cache lock and perform put operation in the same
>     thread, which
>     > create implicit transaction. Transaction would compare both
>     therad ids in
>     > tx and cache lock and can apply the optimisation: tx would take
>     advantage
>     > of cache lock, and would not create its own lock on key."
>     >
>     > If we really needed to remove it completely from the code,
>     significant
>     > changes are required both to transactions and cache locks.
>     >
>     > It could take several weeks to completely remove thread id.
>     >
>     > > As far as I can see from parent task [1] we need some complex
>     tests to be
>     >
>     > implemented.
>     > > Are they presented in prototype?
>     >
>     > Tests were added for suspending/resuming optimistic transaction
>     earlier for
>     > parent task.
>     > I adopted them for pessimistic mode.
>     > Also tx rollback tests were aded.  They check whether timeout
>     metrics are
>     > correctly calculated for suspended transactions. You can find
>     them in
>     > `GridCacheTransactionalAbstractMetricsSelfTest`.
>     >
>     > Please, feel free to propose additional tests for the ticket.
>     >
>     >
>     > вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov
>     <nizhikov@apache.org <ma...@apache.org>>:
>     >
>     > > Hello, Alexey.
>     > >
>     > > Great mail, by the way.
>     > > I think, it would be great to have this feature in Ignite.
>     > >
>     > > > I haven't removed thread id completely from code.
>     > >
>     > > Can we remove thread id completely from code?
>     > > Can you estimate how much effort do we need?
>     > >
>     > > As far as I can see from parent task [1] we need some complex
>     tests to be
>     > > implemented.
>     > > Are they presented in prototype?
>     > >
>     > > [1]
>     > >
>     https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
>     > >
>     > > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
>     > > > Hello Igniters!
>     > > >
>     > > > Currently we have suspension/resuming implemented for optimistic
>     > > > transactions [1].
>     > > > Unless suspend/resume isn't supported for pessimistic tx JTA
>     isn't fully
>     > > > supported [4].
>     > > >
>     > > > I’m working on a ticket "Suspension/resuming for pessimistic
>     > >
>     > > transactions"
>     > > > [2].
>     > > > Goal of the ticket is to support transaction suspend/resume for
>     > >
>     > > pessimistic
>     > > > transactions.
>     > > >
>     > > > # Benefits of suspending/resuming transaction.
>     > > >
>     > > >   1. Full JTA standart support.
>     > > >   2. Increase of throughput in high-load scenarios.
>     > > >       Suspend operation would allow to release Ignite
>     threads and
>     > > > optionally perform some other logic.
>     > > >
>     > > > Note, current API has got suspend/resume only for optimistic
>     > >
>     > > transactions,
>     > > > which confuses users.
>     > > >
>     > > > # Real life example.
>     > > >
>     > > > Consider the following scenario:
>     > > >
>     > > >   1. Application starts Ignite transaction.
>     > > >   2. Business logic is executed inside transaction.
>     > > >   3. For commit/rollback application need approval message
>     from external
>     > > > agent.
>     > > >   4. Currently, thread inside Ignite is idle until approval
>     is received.
>     > > >   4a. When suspend/resume support is implemented,
>     application can perform
>     > > > suspend and release thread inside Ignite.
>     > > >
>     > > > # How pessimistic transaction works.
>     > > >
>     > > > When we perform put/get operations in pessimistic
>     transactions, lock
>     > > > request is sent to remote nodes by `GridNearLockRequest`.
>     > > > Request contains thread id `IgniteTxAdapter#threadId`, in
>     which operation
>     > > > was performed.
>     > > > In pessimistic mode, multiple transaction objects are
>     created - on
>     > > > primary, on backup nodes, and on originating node:
>     > > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
>     > >
>     > > `GridDhtTxRemote`.
>     > > >
>     > > > Thread id is used in logic on these nodes.
>     > > > For instance, to check whether thread has successfully
>     locked the key,
>     > > > after lock acquisition attempt.
>     > > > Or to check whether active transaction exists.
>     > > >
>     > > > # Main challenge for implementation.
>     > > >
>     > > > I've analysed implementation approaches and see the core issue:
>     > > >
>     > > > The essential problem with suspending/resuming lies in
>     thread id field
>     > > > transferred to remote nodes during put/get operations.
>     > > >
>     > > > Imagine, we want to suspend transaction and resume it in
>     another thread.
>     > > > See code snippet below:
>     > > >
>     > > > ```
>     > > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
>     > > >
>     > > > cache.put(1, 1);
>     > > >
>     > > > tx.suspend();
>     > > > ....
>     > > >
>     > > > // In another thread.
>     > > > tx.resume(); // Thread id will be changed in transaction
>     instance.
>     > > > ```
>     > > >
>     > > > Original thread id is transferred and saved on remote node.
>     > > > After resuming thread id on local node differs from remote node.
>     > > > I want to avoid one more network round trip to change thread
>     Id on remote
>     > > > node after transaction resuming.
>     > > >
>     > > > # Design proposal.
>     > > >
>     > > > Transaction id (`xid`) can be used instead of thread id on
>     remote nodes.
>     > > > The following solution is possible for the ticket :
>     > > >
>     > > > Replace thread id by transaction id for sending to remote nodes.
>     > > > Thread id will be removed from the following classes:
>     > > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
>     > > > `GridDistributedTxFinishRequest`,
>     `GridDistributedTxFinishResponse`,
>     > > > `GridDistributedTxPrepareRequest`.
>     > > >
>     > > > I haven't removed thread id completely from code. Thread id
>     is moved to
>     > > > `GridNearTxLocal`.
>     > > > We still need it in near local transaction for many reasons,
>     for example
>     > >
>     > > to
>     > > > assure only thread started transaction can suspend it in
>     > > > `GridNearTxLocal#suspend()`.
>     > > > In future we can remove thread id completely. I propose to
>     study this
>     > > > question in another ticket.
>     > > >
>     > > > Also, thread id is remained in `GridDistributedLockRequest`.
>     > > > Lock request used by cache locks and it need to transfer
>     thread id to
>     > > > remote nodes.
>     > > > For example to use cache locks along with cache operations
>     put/get, see
>     > > > `GridNearTxLocal#updateExplicitVersion`.
>     > > > As for pessimistic transaction, thread id in
>     `GridDistributedLockRequest`
>     > > > is set to `UNDEFINED_THREAD_ID`, which means we must not use
>     it remotely.
>     > > >
>     > > > Note, that if user suspends transaction and forgets to
>     resume it,
>     > > > transaction would be rolled back once timeout has occurred.
>     > > >
>     > > > In my design when transaction is suspended, all locked keys
>     remain
>     > >
>     > > locked.
>     > > >
>     > > > Please see my prototype of proposal implementation [3].
>     > > >
>     > > > Proposed changes are relatively small.
>     > > > They ensure consistency of information about locks, if
>     thread Id will be
>     > > > changed within one transaction (by suspend/resume).
>     > > > There will be used correct id for locks on remote nodes. It
>     also requires
>     > > > painstaking work, but changes will not affect the logic of oher
>     > >
>     > > components.
>     > > >
>     > > > Tell me please what do you think? Any suggestions and
>     comments will be
>     > > > helpful.
>     > > >
>     > > > If you agree with my design I also will do benchmarking.
>     > > >
>     > > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
>     > > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
>     > > > [3] https://github.com/apache/ignite/pull/2789
>     > > > [4] Section 3.2.3
>     > > >
>     > >
>     > >
>     http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
>     >
>     > 
>
> -- 
>
> /Best Regards,/
>
> */Kuznetsov Aleksey/*
>


Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Posted by Aleksey Kuznetsov <al...@gmail.com>.
> What if originating(or primary, or couple of primaries) node(s) fails
when tx is suspended?

Thanks for advice, I've added a couple of failover tests. With the
following scenarios :

Started transaction, perform some put, suspend it and primary node fails.
Works as expected : tx rollbacks.
In another scenario, the originating node fails. Works as expected : tx
rollbacks.

> Do we have tests for this feature?

yeah,we do.

ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <ni...@apache.org>:

> Hello, Aleksey.
>
> > Tests were added for suspending/resuming optimistic transaction earlier
> for parent task.
> > I adopted them for pessimistic mode. Also tx rollback tests were aded.
>
> I think, at least, we need new failover tests.
>
> What if originating(or primary, or couple of primaries) node(s) fails when
> tx is suspended?
>
> > Note, that if user suspends transaction and forgets to resume it,
> > transaction would be rolled back once timeout has occurred.
>
> Do we have tests for this feature?
>
> В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
> > Thanx for the answer!
> >
> > > Can we remove thread id completely from code?
> > > Can you estimate how much effort do we need?
> >
> > I haven't removed thread id completely from the code because we tightly
> > coupled to it in some parts.
> > For instance, look at the following scenario :
> >
> > "When we suspend transaction, we must make sure only thread started
> > transaction can suspend it. So we compare thread id in
> > `GridNearTxLocal#suspend()`."
> >
> > Note that cache locks also make use of thread id, consider the following
> > scenario:
> >
> > "We create cache lock and perform put operation in the same thread, which
> > create implicit transaction. Transaction would compare both therad ids in
> > tx and cache lock and can apply the optimisation: tx would take advantage
> > of cache lock, and would not create its own lock on key."
> >
> > If we really needed to remove it completely from the code, significant
> > changes are required both to transactions and cache locks.
> >
> > It could take several weeks to completely remove thread id.
> >
> > > As far as I can see from parent task [1] we need some complex tests to
> be
> >
> > implemented.
> > > Are they presented in prototype?
> >
> > Tests were added for suspending/resuming optimistic transaction earlier
> for
> > parent task.
> > I adopted them for pessimistic mode.
> > Also tx rollback tests were aded.  They check whether timeout metrics are
> > correctly calculated for suspended transactions. You can find them in
> > `GridCacheTransactionalAbstractMetricsSelfTest`.
> >
> > Please, feel free to propose additional tests for the ticket.
> >
> >
> > вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <ni...@apache.org>:
> >
> > > Hello, Alexey.
> > >
> > > Great mail, by the way.
> > > I think, it would be great to have this feature in Ignite.
> > >
> > > > I haven't removed thread id completely from code.
> > >
> > > Can we remove thread id completely from code?
> > > Can you estimate how much effort do we need?
> > >
> > > As far as I can see from parent task [1] we need some complex tests to
> be
> > > implemented.
> > > Are they presented in prototype?
> > >
> > > [1]
> > >
> https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
> > >
> > > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > > > Hello Igniters!
> > > >
> > > > Currently we have suspension/resuming implemented for optimistic
> > > > transactions [1].
> > > > Unless suspend/resume isn't supported for pessimistic tx JTA isn't
> fully
> > > > supported [4].
> > > >
> > > > I’m working on a ticket "Suspension/resuming for pessimistic
> > >
> > > transactions"
> > > > [2].
> > > > Goal of the ticket is to support transaction suspend/resume for
> > >
> > > pessimistic
> > > > transactions.
> > > >
> > > > # Benefits of suspending/resuming transaction.
> > > >
> > > >   1. Full JTA standart support.
> > > >   2. Increase of throughput in high-load scenarios.
> > > >       Suspend operation would allow to release Ignite threads and
> > > > optionally perform some other logic.
> > > >
> > > > Note, current API has got suspend/resume only for optimistic
> > >
> > > transactions,
> > > > which confuses users.
> > > >
> > > > # Real life example.
> > > >
> > > > Consider the following scenario:
> > > >
> > > >   1. Application starts Ignite transaction.
> > > >   2. Business logic is executed inside transaction.
> > > >   3. For commit/rollback application need approval message from
> external
> > > > agent.
> > > >   4. Currently, thread inside Ignite is idle until approval is
> received.
> > > >   4a. When suspend/resume support is implemented, application can
> perform
> > > > suspend and release thread inside Ignite.
> > > >
> > > > # How pessimistic transaction works.
> > > >
> > > > When we perform put/get operations in pessimistic transactions, lock
> > > > request is sent to remote nodes by `GridNearLockRequest`.
> > > > Request contains thread id `IgniteTxAdapter#threadId`, in which
> operation
> > > > was performed.
> > > > In pessimistic mode, multiple transaction objects are created - on
> > > > primary, on backup nodes, and on originating node:
> > > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> > >
> > > `GridDhtTxRemote`.
> > > >
> > > > Thread id is used in logic on these nodes.
> > > > For instance, to check whether thread has successfully locked the
> key,
> > > > after lock acquisition attempt.
> > > > Or to check whether active transaction exists.
> > > >
> > > > # Main challenge for implementation.
> > > >
> > > > I've analysed implementation approaches and see the core issue:
> > > >
> > > > The essential problem with suspending/resuming lies in thread id
> field
> > > > transferred to remote nodes during put/get operations.
> > > >
> > > > Imagine, we want to suspend transaction and resume it in another
> thread.
> > > > See code snippet below:
> > > >
> > > > ```
> > > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> > > >
> > > > cache.put(1, 1);
> > > >
> > > > tx.suspend();
> > > > ....
> > > >
> > > > // In another thread.
> > > > tx.resume(); // Thread id will be changed in transaction instance.
> > > > ```
> > > >
> > > > Original thread id is transferred and saved on remote node.
> > > > After resuming thread id on local node differs from remote node.
> > > > I want to avoid one more network round trip to change thread Id on
> remote
> > > > node after transaction resuming.
> > > >
> > > > # Design proposal.
> > > >
> > > > Transaction id (`xid`) can be used instead of thread id on remote
> nodes.
> > > > The following solution is possible for the ticket :
> > > >
> > > > Replace thread id by transaction id for sending to remote nodes.
> > > > Thread id will be removed from the following classes:
> > > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > > > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > > > `GridDistributedTxPrepareRequest`.
> > > >
> > > > I haven't removed thread id completely from code. Thread id is moved
> to
> > > > `GridNearTxLocal`.
> > > > We still need it in near local transaction for many reasons, for
> example
> > >
> > > to
> > > > assure only thread started transaction can suspend it in
> > > > `GridNearTxLocal#suspend()`.
> > > > In future we can remove thread id completely. I propose to study this
> > > > question in another ticket.
> > > >
> > > > Also, thread id is remained in `GridDistributedLockRequest`.
> > > > Lock request used by cache locks and it need to transfer thread id to
> > > > remote nodes.
> > > > For example to use cache locks along with cache operations put/get,
> see
> > > > `GridNearTxLocal#updateExplicitVersion`.
> > > > As for pessimistic transaction, thread id in
> `GridDistributedLockRequest`
> > > > is set to `UNDEFINED_THREAD_ID`, which means we must not use it
> remotely.
> > > >
> > > > Note, that if user suspends transaction and forgets to resume it,
> > > > transaction would be rolled back once timeout has occurred.
> > > >
> > > > In my design when transaction is suspended, all locked keys remain
> > >
> > > locked.
> > > >
> > > > Please see my prototype of proposal implementation [3].
> > > >
> > > > Proposed changes are relatively small.
> > > > They ensure consistency of information about locks, if thread Id
> will be
> > > > changed within one transaction (by suspend/resume).
> > > > There will be used correct id for locks on remote nodes. It also
> requires
> > > > painstaking work, but changes will not affect the logic of oher
> > >
> > > components.
> > > >
> > > > Tell me please what do you think? Any suggestions and comments will
> be
> > > > helpful.
> > > >
> > > > If you agree with my design I also will do benchmarking.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > > > [3] https://github.com/apache/ignite/pull/2789
> > > > [4] Section 3.2.3
> > > >
> > >
> > >
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
> >
> >

-- 

*Best Regards,*

*Kuznetsov Aleksey*

Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Aleksey.

> Tests were added for suspending/resuming optimistic transaction earlier for parent task.
> I adopted them for pessimistic mode. Also tx rollback tests were aded.

I think, at least, we need new failover tests.

What if originating(or primary, or couple of primaries) node(s) fails when tx is suspended?

> Note, that if user suspends transaction and forgets to resume it,
> transaction would be rolled back once timeout has occurred.

Do we have tests for this feature?

В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
> Thanx for the answer!
> 
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
> 
> I haven't removed thread id completely from the code because we tightly
> coupled to it in some parts.
> For instance, look at the following scenario :
> 
> "When we suspend transaction, we must make sure only thread started
> transaction can suspend it. So we compare thread id in
> `GridNearTxLocal#suspend()`."
> 
> Note that cache locks also make use of thread id, consider the following
> scenario:
> 
> "We create cache lock and perform put operation in the same thread, which
> create implicit transaction. Transaction would compare both therad ids in
> tx and cache lock and can apply the optimisation: tx would take advantage
> of cache lock, and would not create its own lock on key."
> 
> If we really needed to remove it completely from the code, significant
> changes are required both to transactions and cache locks.
> 
> It could take several weeks to completely remove thread id.
> 
> > As far as I can see from parent task [1] we need some complex tests to be
> 
> implemented.
> > Are they presented in prototype?
> 
> Tests were added for suspending/resuming optimistic transaction earlier for
> parent task.
> I adopted them for pessimistic mode.
> Also tx rollback tests were aded.  They check whether timeout metrics are
> correctly calculated for suspended transactions. You can find them in
> `GridCacheTransactionalAbstractMetricsSelfTest`.
> 
> Please, feel free to propose additional tests for the ticket.
> 
> 
> вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <ni...@apache.org>:
> 
> > Hello, Alexey.
> > 
> > Great mail, by the way.
> > I think, it would be great to have this feature in Ignite.
> > 
> > > I haven't removed thread id completely from code.
> > 
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
> > 
> > As far as I can see from parent task [1] we need some complex tests to be
> > implemented.
> > Are they presented in prototype?
> > 
> > [1]
> > https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
> > 
> > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > > Hello Igniters!
> > > 
> > > Currently we have suspension/resuming implemented for optimistic
> > > transactions [1].
> > > Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> > > supported [4].
> > > 
> > > I’m working on a ticket "Suspension/resuming for pessimistic
> > 
> > transactions"
> > > [2].
> > > Goal of the ticket is to support transaction suspend/resume for
> > 
> > pessimistic
> > > transactions.
> > > 
> > > # Benefits of suspending/resuming transaction.
> > > 
> > >   1. Full JTA standart support.
> > >   2. Increase of throughput in high-load scenarios.
> > >       Suspend operation would allow to release Ignite threads and
> > > optionally perform some other logic.
> > > 
> > > Note, current API has got suspend/resume only for optimistic
> > 
> > transactions,
> > > which confuses users.
> > > 
> > > # Real life example.
> > > 
> > > Consider the following scenario:
> > > 
> > >   1. Application starts Ignite transaction.
> > >   2. Business logic is executed inside transaction.
> > >   3. For commit/rollback application need approval message from external
> > > agent.
> > >   4. Currently, thread inside Ignite is idle until approval is received.
> > >   4a. When suspend/resume support is implemented, application can perform
> > > suspend and release thread inside Ignite.
> > > 
> > > # How pessimistic transaction works.
> > > 
> > > When we perform put/get operations in pessimistic transactions, lock
> > > request is sent to remote nodes by `GridNearLockRequest`.
> > > Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> > > was performed.
> > > In pessimistic mode, multiple transaction objects are created - on
> > > primary, on backup nodes, and on originating node:
> > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> > 
> > `GridDhtTxRemote`.
> > > 
> > > Thread id is used in logic on these nodes.
> > > For instance, to check whether thread has successfully locked the key,
> > > after lock acquisition attempt.
> > > Or to check whether active transaction exists.
> > > 
> > > # Main challenge for implementation.
> > > 
> > > I've analysed implementation approaches and see the core issue:
> > > 
> > > The essential problem with suspending/resuming lies in thread id field
> > > transferred to remote nodes during put/get operations.
> > > 
> > > Imagine, we want to suspend transaction and resume it in another thread.
> > > See code snippet below:
> > > 
> > > ```
> > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> > > 
> > > cache.put(1, 1);
> > > 
> > > tx.suspend();
> > > ....
> > > 
> > > // In another thread.
> > > tx.resume(); // Thread id will be changed in transaction instance.
> > > ```
> > > 
> > > Original thread id is transferred and saved on remote node.
> > > After resuming thread id on local node differs from remote node.
> > > I want to avoid one more network round trip to change thread Id on remote
> > > node after transaction resuming.
> > > 
> > > # Design proposal.
> > > 
> > > Transaction id (`xid`) can be used instead of thread id on remote nodes.
> > > The following solution is possible for the ticket :
> > > 
> > > Replace thread id by transaction id for sending to remote nodes.
> > > Thread id will be removed from the following classes:
> > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > > `GridDistributedTxPrepareRequest`.
> > > 
> > > I haven't removed thread id completely from code. Thread id is moved to
> > > `GridNearTxLocal`.
> > > We still need it in near local transaction for many reasons, for example
> > 
> > to
> > > assure only thread started transaction can suspend it in
> > > `GridNearTxLocal#suspend()`.
> > > In future we can remove thread id completely. I propose to study this
> > > question in another ticket.
> > > 
> > > Also, thread id is remained in `GridDistributedLockRequest`.
> > > Lock request used by cache locks and it need to transfer thread id to
> > > remote nodes.
> > > For example to use cache locks along with cache operations put/get, see
> > > `GridNearTxLocal#updateExplicitVersion`.
> > > As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> > > is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> > > 
> > > Note, that if user suspends transaction and forgets to resume it,
> > > transaction would be rolled back once timeout has occurred.
> > > 
> > > In my design when transaction is suspended, all locked keys remain
> > 
> > locked.
> > > 
> > > Please see my prototype of proposal implementation [3].
> > > 
> > > Proposed changes are relatively small.
> > > They ensure consistency of information about locks, if thread Id will be
> > > changed within one transaction (by suspend/resume).
> > > There will be used correct id for locks on remote nodes. It also requires
> > > painstaking work, but changes will not affect the logic of oher
> > 
> > components.
> > > 
> > > Tell me please what do you think? Any suggestions and comments will be
> > > helpful.
> > > 
> > > If you agree with my design I also will do benchmarking.
> > > 
> > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > > [3] https://github.com/apache/ignite/pull/2789
> > > [4] Section 3.2.3
> > > 
> > 
> > http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
> 
> 

Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Posted by Aleksey Kuznetsov <al...@gmail.com>.
Thanx for the answer!

> Can we remove thread id completely from code?
> Can you estimate how much effort do we need?

I haven't removed thread id completely from the code because we tightly
coupled to it in some parts.
For instance, look at the following scenario :

"When we suspend transaction, we must make sure only thread started
transaction can suspend it. So we compare thread id in
`GridNearTxLocal#suspend()`."

Note that cache locks also make use of thread id, consider the following
scenario:

"We create cache lock and perform put operation in the same thread, which
create implicit transaction. Transaction would compare both therad ids in
tx and cache lock and can apply the optimisation: tx would take advantage
of cache lock, and would not create its own lock on key."

If we really needed to remove it completely from the code, significant
changes are required both to transactions and cache locks.

It could take several weeks to completely remove thread id.

> As far as I can see from parent task [1] we need some complex tests to be
implemented.
> Are they presented in prototype?

Tests were added for suspending/resuming optimistic transaction earlier for
parent task.
I adopted them for pessimistic mode.
Also tx rollback tests were aded.  They check whether timeout metrics are
correctly calculated for suspended transactions. You can find them in
`GridCacheTransactionalAbstractMetricsSelfTest`.

Please, feel free to propose additional tests for the ticket.


вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <ni...@apache.org>:

> Hello, Alexey.
>
> Great mail, by the way.
> I think, it would be great to have this feature in Ignite.
>
> > I haven't removed thread id completely from code.
>
> Can we remove thread id completely from code?
> Can you estimate how much effort do we need?
>
> As far as I can see from parent task [1] we need some complex tests to be
> implemented.
> Are they presented in prototype?
>
> [1]
> https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
>
> В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > Hello Igniters!
> >
> > Currently we have suspension/resuming implemented for optimistic
> > transactions [1].
> > Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> > supported [4].
> >
> > I’m working on a ticket "Suspension/resuming for pessimistic
> transactions"
> > [2].
> > Goal of the ticket is to support transaction suspend/resume for
> pessimistic
> > transactions.
> >
> > # Benefits of suspending/resuming transaction.
> >
> >   1. Full JTA standart support.
> >   2. Increase of throughput in high-load scenarios.
> >       Suspend operation would allow to release Ignite threads and
> > optionally perform some other logic.
> >
> > Note, current API has got suspend/resume only for optimistic
> transactions,
> > which confuses users.
> >
> > # Real life example.
> >
> > Consider the following scenario:
> >
> >   1. Application starts Ignite transaction.
> >   2. Business logic is executed inside transaction.
> >   3. For commit/rollback application need approval message from external
> > agent.
> >   4. Currently, thread inside Ignite is idle until approval is received.
> >   4a. When suspend/resume support is implemented, application can perform
> > suspend and release thread inside Ignite.
> >
> > # How pessimistic transaction works.
> >
> > When we perform put/get operations in pessimistic transactions, lock
> > request is sent to remote nodes by `GridNearLockRequest`.
> > Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> > was performed.
> > In pessimistic mode, multiple transaction objects are created - on
> > primary, on backup nodes, and on originating node:
> > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> `GridDhtTxRemote`.
> >
> > Thread id is used in logic on these nodes.
> > For instance, to check whether thread has successfully locked the key,
> > after lock acquisition attempt.
> > Or to check whether active transaction exists.
> >
> > # Main challenge for implementation.
> >
> > I've analysed implementation approaches and see the core issue:
> >
> > The essential problem with suspending/resuming lies in thread id field
> > transferred to remote nodes during put/get operations.
> >
> > Imagine, we want to suspend transaction and resume it in another thread.
> > See code snippet below:
> >
> > ```
> > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> >
> > cache.put(1, 1);
> >
> > tx.suspend();
> > ....
> >
> > // In another thread.
> > tx.resume(); // Thread id will be changed in transaction instance.
> > ```
> >
> > Original thread id is transferred and saved on remote node.
> > After resuming thread id on local node differs from remote node.
> > I want to avoid one more network round trip to change thread Id on remote
> > node after transaction resuming.
> >
> > # Design proposal.
> >
> > Transaction id (`xid`) can be used instead of thread id on remote nodes.
> > The following solution is possible for the ticket :
> >
> > Replace thread id by transaction id for sending to remote nodes.
> > Thread id will be removed from the following classes:
> > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > `GridDistributedTxPrepareRequest`.
> >
> > I haven't removed thread id completely from code. Thread id is moved to
> > `GridNearTxLocal`.
> > We still need it in near local transaction for many reasons, for example
> to
> > assure only thread started transaction can suspend it in
> > `GridNearTxLocal#suspend()`.
> > In future we can remove thread id completely. I propose to study this
> > question in another ticket.
> >
> > Also, thread id is remained in `GridDistributedLockRequest`.
> > Lock request used by cache locks and it need to transfer thread id to
> > remote nodes.
> > For example to use cache locks along with cache operations put/get, see
> > `GridNearTxLocal#updateExplicitVersion`.
> > As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> > is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> >
> > Note, that if user suspends transaction and forgets to resume it,
> > transaction would be rolled back once timeout has occurred.
> >
> > In my design when transaction is suspended, all locked keys remain
> locked.
> >
> > Please see my prototype of proposal implementation [3].
> >
> > Proposed changes are relatively small.
> > They ensure consistency of information about locks, if thread Id will be
> > changed within one transaction (by suspend/resume).
> > There will be used correct id for locks on remote nodes. It also requires
> > painstaking work, but changes will not affect the logic of oher
> components.
> >
> > Tell me please what do you think? Any suggestions and comments will be
> > helpful.
> >
> > If you agree with my design I also will do benchmarking.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > [3] https://github.com/apache/ignite/pull/2789
> > [4] Section 3.2.3
> >
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf

-- 

*Best Regards,*

*Kuznetsov Aleksey*

Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Alexey.

Great mail, by the way.
I think, it would be great to have this feature in Ignite.

> I haven't removed thread id completely from code.

Can we remove thread id completely from code?
Can you estimate how much effort do we need?

As far as I can see from parent task [1] we need some complex tests to be implemented.
Are they presented in prototype?

[1] https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655

В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> Hello Igniters!
> 
> Currently we have suspension/resuming implemented for optimistic
> transactions [1].
> Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> supported [4].
> 
> I’m working on a ticket "Suspension/resuming for pessimistic transactions"
> [2].
> Goal of the ticket is to support transaction suspend/resume for pessimistic
> transactions.
> 
> # Benefits of suspending/resuming transaction.
> 
>   1. Full JTA standart support.
>   2. Increase of throughput in high-load scenarios.
>       Suspend operation would allow to release Ignite threads and
> optionally perform some other logic.
> 
> Note, current API has got suspend/resume only for optimistic transactions,
> which confuses users.
> 
> # Real life example.
> 
> Consider the following scenario:
> 
>   1. Application starts Ignite transaction.
>   2. Business logic is executed inside transaction.
>   3. For commit/rollback application need approval message from external
> agent.
>   4. Currently, thread inside Ignite is idle until approval is received.
>   4a. When suspend/resume support is implemented, application can perform
> suspend and release thread inside Ignite.
> 
> # How pessimistic transaction works.
> 
> When we perform put/get operations in pessimistic transactions, lock
> request is sent to remote nodes by `GridNearLockRequest`.
> Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> was performed.
> In pessimistic mode, multiple transaction objects are created - on
> primary, on backup nodes, and on originating node:
> `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`, `GridDhtTxRemote`.
> 
> Thread id is used in logic on these nodes.
> For instance, to check whether thread has successfully locked the key,
> after lock acquisition attempt.
> Or to check whether active transaction exists.
> 
> # Main challenge for implementation.
> 
> I've analysed implementation approaches and see the core issue:
> 
> The essential problem with suspending/resuming lies in thread id field
> transferred to remote nodes during put/get operations.
> 
> Imagine, we want to suspend transaction and resume it in another thread.
> See code snippet below:
> 
> ```
> tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> 
> cache.put(1, 1);
> 
> tx.suspend();
> ....
> 
> // In another thread.
> tx.resume(); // Thread id will be changed in transaction instance.
> ```
> 
> Original thread id is transferred and saved on remote node.
> After resuming thread id on local node differs from remote node.
> I want to avoid one more network round trip to change thread Id on remote
> node after transaction resuming.
> 
> # Design proposal.
> 
> Transaction id (`xid`) can be used instead of thread id on remote nodes.
> The following solution is possible for the ticket :
> 
> Replace thread id by transaction id for sending to remote nodes.
> Thread id will be removed from the following classes:
> `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> `GridDistributedTxPrepareRequest`.
> 
> I haven't removed thread id completely from code. Thread id is moved to
> `GridNearTxLocal`.
> We still need it in near local transaction for many reasons, for example to
> assure only thread started transaction can suspend it in
> `GridNearTxLocal#suspend()`.
> In future we can remove thread id completely. I propose to study this
> question in another ticket.
> 
> Also, thread id is remained in `GridDistributedLockRequest`.
> Lock request used by cache locks and it need to transfer thread id to
> remote nodes.
> For example to use cache locks along with cache operations put/get, see
> `GridNearTxLocal#updateExplicitVersion`.
> As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> 
> Note, that if user suspends transaction and forgets to resume it,
> transaction would be rolled back once timeout has occurred.
> 
> In my design when transaction is suspended, all locked keys remain locked.
> 
> Please see my prototype of proposal implementation [3].
> 
> Proposed changes are relatively small.
> They ensure consistency of information about locks, if thread Id will be
> changed within one transaction (by suspend/resume).
> There will be used correct id for locks on remote nodes. It also requires
> painstaking work, but changes will not affect the logic of oher components.
> 
> Tell me please what do you think? Any suggestions and comments will be
> helpful.
> 
> If you agree with my design I also will do benchmarking.
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-5712
> [2] https://issues.apache.org/jira/browse/IGNITE-5714
> [3] https://github.com/apache/ignite/pull/2789
> [4] Section 3.2.3
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf