You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Ilya Lantukh <il...@gridgain.com> on 2017/09/08 11:27:51 UTC

Monitoring of active transactions

Igniters,

According to our current design and implementation, unclosed transaction or
unreleased lock can hang ignite cluster forever. This is logical, and with
correct usage of those mechanics such issue should never happen, in real
world developers can make mistakes and leave transaction open. We have a
feature "transaction timeout", but turns out it doesn't work in all cases
(see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all known
issues are fixed, there is still a lot of room for mistake and incorrect
usage.

To make it possible for Ignite users to discover such problem and trace it
to a particular part of code, I've created a very simple utility that
collects and prints information about long running transactions for the
whole cluster. It is available here:
https://github.com/ilantukh/IgniteTxViewer.

One might expect such monitoring utilities to be included in Ignite
codebase. Personally, I think that such information should be available
from public API, without using of additional applications or diving into
Ignite internals.

What do you think?

-- 
Best regards,
Ilya

Re: Monitoring of active transactions

Posted by Nikolay Izhikov <ni...@gmail.com>.
Hello, Ilya.

Great! Thanks!

Can I extend your idea a bit?

I think it be very useful monitor all-user provided listeners and 
callbacks also.

To handle following scenarios:

1а. User starts a job with ExecutorService, IgniteCompute or similar.
1b. User creates ContinuousQuery with remoteFilter and localListener.
2. It take huge amount of time to execute user callback on some node. Or 
thread blocks on some monitor inside callback.

In that case Ignite can detect it and print some waring message.
We can cancel user callback to free resources in some cases.

Specific timeouts and cancel policy should be configured by user.

May be it already covered by FailOverSpi [1] but I can't find 
description of such feature.

We can take WebSphere hangs detection mechanism [2], [3] as an example.

[1] https://apacheignite.readme.io/docs/fault-tolerance
[2] 
https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/hung_thread_detection_in_websphere_application_server?lang=en
[3] 
https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.iseries.doc/ae/ttrb_confighangdet.html



08.09.2017 14:27, Ilya Lantukh пишет:
> Igniters,
> 
> According to our current design and implementation, unclosed transaction or
> unreleased lock can hang ignite cluster forever. This is logical, and with
> correct usage of those mechanics such issue should never happen, in real
> world developers can make mistakes and leave transaction open. We have a
> feature "transaction timeout", but turns out it doesn't work in all cases
> (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all known
> issues are fixed, there is still a lot of room for mistake and incorrect
> usage.
> 
> To make it possible for Ignite users to discover such problem and trace it
> to a particular part of code, I've created a very simple utility that
> collects and prints information about long running transactions for the
> whole cluster. It is available here:
> https://github.com/ilantukh/IgniteTxViewer.
> 
> One might expect such monitoring utilities to be included in Ignite
> codebase. Personally, I think that such information should be available
> from public API, without using of additional applications or diving into
> Ignite internals.
> 
> What do you think?
> 

Re: Monitoring of active transactions

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?
>

Completely agree.


>
> 2017-09-28 13:35 GMT+03:00 Sasha Belyak <rt...@gmail.com>:
>
> > It's very useful, but I often we need to get list of hang transaction
> when
> > exchange stopped by some reason and in this case utility, based on client
> > node won't help. I rewrite it as ComputeTask with default constructor,
> add
> > jar into ignite libs, activate http rest api and now it can be used from
> > console and no need to update cluster topology:
> >
> > curl '
> > http://127.0.0.1:8080/ignite?cmd=exe&name=org.apache.ignite.txviewer.
> > RestCollectTxInfoTask&p1=false&p2=100
> > '
> >
> > {"successStatus":0,"sessionToken":null,"error":null,"response":{"id":"~
> > 98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,
> > "error":null,"result":[{"nearXidVersion":"GridCacheVersion
> > [topVer=118063514, order=1506583525449,
> > nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-
> > 555a2fc17762","nodeString":"TcpDiscoveryNode
> > [id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
> > 10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1],
> sockAddrs=[/
> > 127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
> > 10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
> > lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:
> > 6452201d,
> > isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
> > 2017","entries":[{"cache":"txCache","key":"1","value":"1"
> > ,"operation":"CREATE"}]}]}}
> >
> > Even better if this tool can use binary rest too. I mean that we should
> be
> > able to run this collecting task from:
> > 1) http rest api by curl/wget (but must deploy class somehow before, by
> > peerClassLoad with Continues mode or by adding it to application
> classpash)
> > Good for admin's console scripting.
> > 2) binary rest api by some java tool (with instant peerClassLoading).
> Good
> > for investigation on any grid configuration.
> > 3) maybe, by client node as it implemented now (can't see any adwantages)
> >
> >
> > 2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dp...@gmail.com>:
> >
> > > Hi Ilya,
> > >
> > > I can help with including this utility into build/release, I've recenty
> > > finished same steps for PDS WAL analysing tool for converting records
> to
> > > human readable format.
> > > Please feel free to contact me.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > >
> > > > It seems that the community (including me) really would like to see
> > this
> > > > feature in Ignite.
> > > >
> > > > Ilya, can you create a ticket and submit it for review?
> > > >
> > > > D.
> > > >
> > > > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org>
> > wrote:
> > > >
> > > > > Ilya,
> > > > >
> > > > > We extremely need this!
> > > > >
> > > > > Txs and Locks info should be collected on each cluster hang.
> > > > > We already have an issue related to this problem -
> > > > > https://issues.apache.org/jira/browse/IGNITE-4937
> > > > >
> > > > > Nikolay,
> > > > >
> > > > > Good point,
> > > > > but, seems you should start separate thread to discuss this.
> > > > >
> > > > > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <
> dpavlov.spb@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Ilya,
> > > > > >
> > > > > > I'm definitely +1 for including the utility in the product.
> Perfect
> > > > > > contribution.
> > > > > >
> > > > > > Sincerely,
> > > > > > Dmitriy Pavlov
> > > > > >
> > > > > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <ilantukh@gridgain.com
> >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > According to our current design and implementation, unclosed
> > > > > transaction
> > > > > > or
> > > > > > > unreleased lock can hang ignite cluster forever. This is
> logical,
> > > and
> > > > > > with
> > > > > > > correct usage of those mechanics such issue should never
> happen,
> > in
> > > > > real
> > > > > > > world developers can make mistakes and leave transaction open.
> We
> > > > have
> > > > > a
> > > > > > > feature "transaction timeout", but turns out it doesn't work in
> > all
> > > > > cases
> > > > > > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even
> if
> > > all
> > > > > > known
> > > > > > > issues are fixed, there is still a lot of room for mistake and
> > > > > incorrect
> > > > > > > usage.
> > > > > > >
> > > > > > > To make it possible for Ignite users to discover such problem
> and
> > > > trace
> > > > > > it
> > > > > > > to a particular part of code, I've created a very simple
> utility
> > > that
> > > > > > > collects and prints information about long running transactions
> > for
> > > > the
> > > > > > > whole cluster. It is available here:
> > > > > > > https://github.com/ilantukh/IgniteTxViewer.
> > > > > > >
> > > > > > > One might expect such monitoring utilities to be included in
> > Ignite
> > > > > > > codebase. Personally, I think that such information should be
> > > > available
> > > > > > > from public API, without using of additional applications or
> > diving
> > > > > into
> > > > > > > Ignite internals.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ilya
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Monitoring of active transactions

Posted by Sasha Belyak <rt...@gmail.com>.
On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We
should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest
and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?
Agree. Should we discuss command parameters here? For now txViewer handle 2
options:
1) time to decide that tx is "long", int in ms
2) server only flag to filter only tx's started from server, bool
I suggest it should handle:
1) time, int in ms
2) nodeType= {server,client,all}, all is default
3) show tx only or tx with keys, keys+values: entries={no,keys,entries} no
is default
maybe some filters, like:
4) nodes=<nodeId list> to select tx only from specified nodes
5) txId=<txId> to select entries only by specified tx
6) cache=<cacheName> to select only tx with entries in specified cache(s?)
7) some another filter options like tx isolation level/tx mode, can anybody
write use cases for it?

2017-10-05 21:24 GMT+07:00 Ilya Lantukh <il...@gridgain.com>:

> On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > wrote:
>
> > Guys,
> >
> > I think we should not limit this functionality to http-rest only. We
> should
> > add this information to one of the MBeans as the primary information
> > source. Then this should be added as a client command both to http-rest
> and
> > binary-rest endpoints, so the information is available through different
> > tools.
> >
> > Thoughts?
>
> I agree with this approach.
>
> On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > wrote:
>
> > Guys,
> >
> > I think we should not limit this functionality to http-rest only. We
> should
> > add this information to one of the MBeans as the primary information
> > source. Then this should be added as a client command both to http-rest
> and
> > binary-rest endpoints, so the information is available through different
> > tools.
> >
> > Thoughts?
> >
> > 2017-09-28 13:35 GMT+03:00 Sasha Belyak <rt...@gmail.com>:
> >
> > > It's very useful, but I often we need to get list of hang transaction
> > when
> > > exchange stopped by some reason and in this case utility, based on
> client
> > > node won't help. I rewrite it as ComputeTask with default constructor,
> > add
> > > jar into ignite libs, activate http rest api and now it can be used
> from
> > > console and no need to update cluster topology:
> > >
> > > curl '
> > > http://127.0.0.1:8080/ignite?cmd=exe&name=org.apache.ignite.txviewer.
> > > RestCollectTxInfoTask&p1=false&p2=100
> > > '
> > >
> > > {"successStatus":0,"sessionToken":null,"error":
> null,"response":{"id":"~
> > > 98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,
> > > "error":null,"result":[{"nearXidVersion":"GridCacheVersion
> > > [topVer=118063514, order=1506583525449,
> > > nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-
> > > 555a2fc17762","nodeString":"TcpDiscoveryNode
> > > [id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
> > > 10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1],
> > sockAddrs=[/
> > > 127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
> > > 10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
> > > lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:
> > > 6452201d,
> > > isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
> > > 2017","entries":[{"cache":"txCache","key":"1","value":"1"
> > > ,"operation":"CREATE"}]}]}}
> > >
> > > Even better if this tool can use binary rest too. I mean that we should
> > be
> > > able to run this collecting task from:
> > > 1) http rest api by curl/wget (but must deploy class somehow before, by
> > > peerClassLoad with Continues mode or by adding it to application
> > classpash)
> > > Good for admin's console scripting.
> > > 2) binary rest api by some java tool (with instant peerClassLoading).
> > Good
> > > for investigation on any grid configuration.
> > > 3) maybe, by client node as it implemented now (can't see any
> adwantages)
> > >
> > >
> > > 2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dp...@gmail.com>:
> > >
> > > > Hi Ilya,
> > > >
> > > > I can help with including this utility into build/release, I've
> recenty
> > > > finished same steps for PDS WAL analysing tool for converting records
> > to
> > > > human readable format.
> > > > Please feel free to contact me.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > > >
> > > > > It seems that the community (including me) really would like to see
> > > this
> > > > > feature in Ignite.
> > > > >
> > > > > Ilya, can you create a ticket and submit it for review?
> > > > >
> > > > > D.
> > > > >
> > > > > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org>
> > > wrote:
> > > > >
> > > > > > Ilya,
> > > > > >
> > > > > > We extremely need this!
> > > > > >
> > > > > > Txs and Locks info should be collected on each cluster hang.
> > > > > > We already have an issue related to this problem -
> > > > > > https://issues.apache.org/jira/browse/IGNITE-4937
> > > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > Good point,
> > > > > > but, seems you should start separate thread to discuss this.
> > > > > >
> > > > > > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <
> > dpavlov.spb@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ilya,
> > > > > > >
> > > > > > > I'm definitely +1 for including the utility in the product.
> > Perfect
> > > > > > > contribution.
> > > > > > >
> > > > > > > Sincerely,
> > > > > > > Dmitriy Pavlov
> > > > > > >
> > > > > > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <
> ilantukh@gridgain.com
> > >:
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > According to our current design and implementation, unclosed
> > > > > > transaction
> > > > > > > or
> > > > > > > > unreleased lock can hang ignite cluster forever. This is
> > logical,
> > > > and
> > > > > > > with
> > > > > > > > correct usage of those mechanics such issue should never
> > happen,
> > > in
> > > > > > real
> > > > > > > > world developers can make mistakes and leave transaction
> open.
> > We
> > > > > have
> > > > > > a
> > > > > > > > feature "transaction timeout", but turns out it doesn't work
> in
> > > all
> > > > > > cases
> > > > > > > > (see https://issues.apache.org/jira/browse/IGNITE-6181).
> Even
> > if
> > > > all
> > > > > > > known
> > > > > > > > issues are fixed, there is still a lot of room for mistake
> and
> > > > > > incorrect
> > > > > > > > usage.
> > > > > > > >
> > > > > > > > To make it possible for Ignite users to discover such problem
> > and
> > > > > trace
> > > > > > > it
> > > > > > > > to a particular part of code, I've created a very simple
> > utility
> > > > that
> > > > > > > > collects and prints information about long running
> transactions
> > > for
> > > > > the
> > > > > > > > whole cluster. It is available here:
> > > > > > > > https://github.com/ilantukh/IgniteTxViewer.
> > > > > > > >
> > > > > > > > One might expect such monitoring utilities to be included in
> > > Ignite
> > > > > > > > codebase. Personally, I think that such information should be
> > > > > available
> > > > > > > > from public API, without using of additional applications or
> > > diving
> > > > > > into
> > > > > > > > Ignite internals.
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Ilya
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Ilya
>

Re: Monitoring of active transactions

Posted by Ilya Lantukh <il...@gridgain.com>.
On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We
should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest
and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?

I agree with this approach.

On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?
>
> 2017-09-28 13:35 GMT+03:00 Sasha Belyak <rt...@gmail.com>:
>
> > It's very useful, but I often we need to get list of hang transaction
> when
> > exchange stopped by some reason and in this case utility, based on client
> > node won't help. I rewrite it as ComputeTask with default constructor,
> add
> > jar into ignite libs, activate http rest api and now it can be used from
> > console and no need to update cluster topology:
> >
> > curl '
> > http://127.0.0.1:8080/ignite?cmd=exe&name=org.apache.ignite.txviewer.
> > RestCollectTxInfoTask&p1=false&p2=100
> > '
> >
> > {"successStatus":0,"sessionToken":null,"error":null,"response":{"id":"~
> > 98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,
> > "error":null,"result":[{"nearXidVersion":"GridCacheVersion
> > [topVer=118063514, order=1506583525449,
> > nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-
> > 555a2fc17762","nodeString":"TcpDiscoveryNode
> > [id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
> > 10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1],
> sockAddrs=[/
> > 127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
> > 10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
> > lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:
> > 6452201d,
> > isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
> > 2017","entries":[{"cache":"txCache","key":"1","value":"1"
> > ,"operation":"CREATE"}]}]}}
> >
> > Even better if this tool can use binary rest too. I mean that we should
> be
> > able to run this collecting task from:
> > 1) http rest api by curl/wget (but must deploy class somehow before, by
> > peerClassLoad with Continues mode or by adding it to application
> classpash)
> > Good for admin's console scripting.
> > 2) binary rest api by some java tool (with instant peerClassLoading).
> Good
> > for investigation on any grid configuration.
> > 3) maybe, by client node as it implemented now (can't see any adwantages)
> >
> >
> > 2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dp...@gmail.com>:
> >
> > > Hi Ilya,
> > >
> > > I can help with including this utility into build/release, I've recenty
> > > finished same steps for PDS WAL analysing tool for converting records
> to
> > > human readable format.
> > > Please feel free to contact me.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > >
> > > > It seems that the community (including me) really would like to see
> > this
> > > > feature in Ignite.
> > > >
> > > > Ilya, can you create a ticket and submit it for review?
> > > >
> > > > D.
> > > >
> > > > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org>
> > wrote:
> > > >
> > > > > Ilya,
> > > > >
> > > > > We extremely need this!
> > > > >
> > > > > Txs and Locks info should be collected on each cluster hang.
> > > > > We already have an issue related to this problem -
> > > > > https://issues.apache.org/jira/browse/IGNITE-4937
> > > > >
> > > > > Nikolay,
> > > > >
> > > > > Good point,
> > > > > but, seems you should start separate thread to discuss this.
> > > > >
> > > > > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <
> dpavlov.spb@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Ilya,
> > > > > >
> > > > > > I'm definitely +1 for including the utility in the product.
> Perfect
> > > > > > contribution.
> > > > > >
> > > > > > Sincerely,
> > > > > > Dmitriy Pavlov
> > > > > >
> > > > > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <ilantukh@gridgain.com
> >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > According to our current design and implementation, unclosed
> > > > > transaction
> > > > > > or
> > > > > > > unreleased lock can hang ignite cluster forever. This is
> logical,
> > > and
> > > > > > with
> > > > > > > correct usage of those mechanics such issue should never
> happen,
> > in
> > > > > real
> > > > > > > world developers can make mistakes and leave transaction open.
> We
> > > > have
> > > > > a
> > > > > > > feature "transaction timeout", but turns out it doesn't work in
> > all
> > > > > cases
> > > > > > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even
> if
> > > all
> > > > > > known
> > > > > > > issues are fixed, there is still a lot of room for mistake and
> > > > > incorrect
> > > > > > > usage.
> > > > > > >
> > > > > > > To make it possible for Ignite users to discover such problem
> and
> > > > trace
> > > > > > it
> > > > > > > to a particular part of code, I've created a very simple
> utility
> > > that
> > > > > > > collects and prints information about long running transactions
> > for
> > > > the
> > > > > > > whole cluster. It is available here:
> > > > > > > https://github.com/ilantukh/IgniteTxViewer.
> > > > > > >
> > > > > > > One might expect such monitoring utilities to be included in
> > Ignite
> > > > > > > codebase. Personally, I think that such information should be
> > > > available
> > > > > > > from public API, without using of additional applications or
> > diving
> > > > > into
> > > > > > > Ignite internals.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ilya
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Best regards,
Ilya

Re: Monitoring of active transactions

Posted by Alexey Goncharuk <al...@gmail.com>.
Guys,

I think we should not limit this functionality to http-rest only. We should
add this information to one of the MBeans as the primary information
source. Then this should be added as a client command both to http-rest and
binary-rest endpoints, so the information is available through different
tools.

Thoughts?

2017-09-28 13:35 GMT+03:00 Sasha Belyak <rt...@gmail.com>:

> It's very useful, but I often we need to get list of hang transaction when
> exchange stopped by some reason and in this case utility, based on client
> node won't help. I rewrite it as ComputeTask with default constructor, add
> jar into ignite libs, activate http rest api and now it can be used from
> console and no need to update cluster topology:
>
> curl '
> http://127.0.0.1:8080/ignite?cmd=exe&name=org.apache.ignite.txviewer.
> RestCollectTxInfoTask&p1=false&p2=100
> '
>
> {"successStatus":0,"sessionToken":null,"error":null,"response":{"id":"~
> 98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,
> "error":null,"result":[{"nearXidVersion":"GridCacheVersion
> [topVer=118063514, order=1506583525449,
> nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-
> 555a2fc17762","nodeString":"TcpDiscoveryNode
> [id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
> 10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1], sockAddrs=[/
> 127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
> 10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
> lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:
> 6452201d,
> isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
> 2017","entries":[{"cache":"txCache","key":"1","value":"1"
> ,"operation":"CREATE"}]}]}}
>
> Even better if this tool can use binary rest too. I mean that we should be
> able to run this collecting task from:
> 1) http rest api by curl/wget (but must deploy class somehow before, by
> peerClassLoad with Continues mode or by adding it to application classpash)
> Good for admin's console scripting.
> 2) binary rest api by some java tool (with instant peerClassLoading). Good
> for investigation on any grid configuration.
> 3) maybe, by client node as it implemented now (can't see any adwantages)
>
>
> 2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dp...@gmail.com>:
>
> > Hi Ilya,
> >
> > I can help with including this utility into build/release, I've recenty
> > finished same steps for PDS WAL analysing tool for converting records to
> > human readable format.
> > Please feel free to contact me.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > It seems that the community (including me) really would like to see
> this
> > > feature in Ignite.
> > >
> > > Ilya, can you create a ticket and submit it for review?
> > >
> > > D.
> > >
> > > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > Ilya,
> > > >
> > > > We extremely need this!
> > > >
> > > > Txs and Locks info should be collected on each cluster hang.
> > > > We already have an issue related to this problem -
> > > > https://issues.apache.org/jira/browse/IGNITE-4937
> > > >
> > > > Nikolay,
> > > >
> > > > Good point,
> > > > but, seems you should start separate thread to discuss this.
> > > >
> > > > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <dpavlov.spb@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Ilya,
> > > > >
> > > > > I'm definitely +1 for including the utility in the product. Perfect
> > > > > contribution.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > According to our current design and implementation, unclosed
> > > > transaction
> > > > > or
> > > > > > unreleased lock can hang ignite cluster forever. This is logical,
> > and
> > > > > with
> > > > > > correct usage of those mechanics such issue should never happen,
> in
> > > > real
> > > > > > world developers can make mistakes and leave transaction open. We
> > > have
> > > > a
> > > > > > feature "transaction timeout", but turns out it doesn't work in
> all
> > > > cases
> > > > > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if
> > all
> > > > > known
> > > > > > issues are fixed, there is still a lot of room for mistake and
> > > > incorrect
> > > > > > usage.
> > > > > >
> > > > > > To make it possible for Ignite users to discover such problem and
> > > trace
> > > > > it
> > > > > > to a particular part of code, I've created a very simple utility
> > that
> > > > > > collects and prints information about long running transactions
> for
> > > the
> > > > > > whole cluster. It is available here:
> > > > > > https://github.com/ilantukh/IgniteTxViewer.
> > > > > >
> > > > > > One might expect such monitoring utilities to be included in
> Ignite
> > > > > > codebase. Personally, I think that such information should be
> > > available
> > > > > > from public API, without using of additional applications or
> diving
> > > > into
> > > > > > Ignite internals.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ilya
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Monitoring of active transactions

Posted by Sasha Belyak <rt...@gmail.com>.
It's very useful, but I often we need to get list of hang transaction when
exchange stopped by some reason and in this case utility, based on client
node won't help. I rewrite it as ComputeTask with default constructor, add
jar into ignite libs, activate http rest api and now it can be used from
console and no need to update cluster topology:

curl '
http://127.0.0.1:8080/ignite?cmd=exe&name=org.apache.ignite.txviewer.RestCollectTxInfoTask&p1=false&p2=100
'

{"successStatus":0,"sessionToken":null,"error":null,"response":{"id":"~98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,"error":null,"result":[{"nearXidVersion":"GridCacheVersion
[topVer=118063514, order=1506583525449,
nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-555a2fc17762","nodeString":"TcpDiscoveryNode
[id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1], sockAddrs=[/
127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:6452201d,
isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
2017","entries":[{"cache":"txCache","key":"1","value":"1","operation":"CREATE"}]}]}}

Even better if this tool can use binary rest too. I mean that we should be
able to run this collecting task from:
1) http rest api by curl/wget (but must deploy class somehow before, by
peerClassLoad with Continues mode or by adding it to application classpash)
Good for admin's console scripting.
2) binary rest api by some java tool (with instant peerClassLoading). Good
for investigation on any grid configuration.
3) maybe, by client node as it implemented now (can't see any adwantages)


2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dp...@gmail.com>:

> Hi Ilya,
>
> I can help with including this utility into build/release, I've recenty
> finished same steps for PDS WAL analysing tool for converting records to
> human readable format.
> Please feel free to contact me.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <ds...@apache.org>:
>
> > It seems that the community (including me) really would like to see this
> > feature in Ignite.
> >
> > Ilya, can you create a ticket and submit it for review?
> >
> > D.
> >
> > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org> wrote:
> >
> > > Ilya,
> > >
> > > We extremely need this!
> > >
> > > Txs and Locks info should be collected on each cluster hang.
> > > We already have an issue related to this problem -
> > > https://issues.apache.org/jira/browse/IGNITE-4937
> > >
> > > Nikolay,
> > >
> > > Good point,
> > > but, seems you should start separate thread to discuss this.
> > >
> > > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <dp...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ilya,
> > > >
> > > > I'm definitely +1 for including the utility in the product. Perfect
> > > > contribution.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:
> > > >
> > > > > Igniters,
> > > > >
> > > > > According to our current design and implementation, unclosed
> > > transaction
> > > > or
> > > > > unreleased lock can hang ignite cluster forever. This is logical,
> and
> > > > with
> > > > > correct usage of those mechanics such issue should never happen, in
> > > real
> > > > > world developers can make mistakes and leave transaction open. We
> > have
> > > a
> > > > > feature "transaction timeout", but turns out it doesn't work in all
> > > cases
> > > > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if
> all
> > > > known
> > > > > issues are fixed, there is still a lot of room for mistake and
> > > incorrect
> > > > > usage.
> > > > >
> > > > > To make it possible for Ignite users to discover such problem and
> > trace
> > > > it
> > > > > to a particular part of code, I've created a very simple utility
> that
> > > > > collects and prints information about long running transactions for
> > the
> > > > > whole cluster. It is available here:
> > > > > https://github.com/ilantukh/IgniteTxViewer.
> > > > >
> > > > > One might expect such monitoring utilities to be included in Ignite
> > > > > codebase. Personally, I think that such information should be
> > available
> > > > > from public API, without using of additional applications or diving
> > > into
> > > > > Ignite internals.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ilya
> > > > >
> > > >
> > >
> >
>

Re: Monitoring of active transactions

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi Ilya,

I can help with including this utility into build/release, I've recenty
finished same steps for PDS WAL analysing tool for converting records to
human readable format.
Please feel free to contact me.

Sincerely,
Dmitriy Pavlov

пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <ds...@apache.org>:

> It seems that the community (including me) really would like to see this
> feature in Ignite.
>
> Ilya, can you create a ticket and submit it for review?
>
> D.
>
> On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org> wrote:
>
> > Ilya,
> >
> > We extremely need this!
> >
> > Txs and Locks info should be collected on each cluster hang.
> > We already have an issue related to this problem -
> > https://issues.apache.org/jira/browse/IGNITE-4937
> >
> > Nikolay,
> >
> > Good point,
> > but, seems you should start separate thread to discuss this.
> >
> > On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <dp...@gmail.com>
> > wrote:
> >
> > > Hi Ilya,
> > >
> > > I'm definitely +1 for including the utility in the product. Perfect
> > > contribution.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:
> > >
> > > > Igniters,
> > > >
> > > > According to our current design and implementation, unclosed
> > transaction
> > > or
> > > > unreleased lock can hang ignite cluster forever. This is logical, and
> > > with
> > > > correct usage of those mechanics such issue should never happen, in
> > real
> > > > world developers can make mistakes and leave transaction open. We
> have
> > a
> > > > feature "transaction timeout", but turns out it doesn't work in all
> > cases
> > > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all
> > > known
> > > > issues are fixed, there is still a lot of room for mistake and
> > incorrect
> > > > usage.
> > > >
> > > > To make it possible for Ignite users to discover such problem and
> trace
> > > it
> > > > to a particular part of code, I've created a very simple utility that
> > > > collects and prints information about long running transactions for
> the
> > > > whole cluster. It is available here:
> > > > https://github.com/ilantukh/IgniteTxViewer.
> > > >
> > > > One might expect such monitoring utilities to be included in Ignite
> > > > codebase. Personally, I think that such information should be
> available
> > > > from public API, without using of additional applications or diving
> > into
> > > > Ignite internals.
> > > >
> > > > What do you think?
> > > >
> > > > --
> > > > Best regards,
> > > > Ilya
> > > >
> > >
> >
>

Re: Monitoring of active transactions

Posted by Dmitriy Setrakyan <ds...@apache.org>.
It seems that the community (including me) really would like to see this
feature in Ignite.

Ilya, can you create a ticket and submit it for review?

D.

On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <av...@apache.org> wrote:

> Ilya,
>
> We extremely need this!
>
> Txs and Locks info should be collected on each cluster hang.
> We already have an issue related to this problem -
> https://issues.apache.org/jira/browse/IGNITE-4937
>
> Nikolay,
>
> Good point,
> but, seems you should start separate thread to discuss this.
>
> On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <dp...@gmail.com>
> wrote:
>
> > Hi Ilya,
> >
> > I'm definitely +1 for including the utility in the product. Perfect
> > contribution.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:
> >
> > > Igniters,
> > >
> > > According to our current design and implementation, unclosed
> transaction
> > or
> > > unreleased lock can hang ignite cluster forever. This is logical, and
> > with
> > > correct usage of those mechanics such issue should never happen, in
> real
> > > world developers can make mistakes and leave transaction open. We have
> a
> > > feature "transaction timeout", but turns out it doesn't work in all
> cases
> > > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all
> > known
> > > issues are fixed, there is still a lot of room for mistake and
> incorrect
> > > usage.
> > >
> > > To make it possible for Ignite users to discover such problem and trace
> > it
> > > to a particular part of code, I've created a very simple utility that
> > > collects and prints information about long running transactions for the
> > > whole cluster. It is available here:
> > > https://github.com/ilantukh/IgniteTxViewer.
> > >
> > > One might expect such monitoring utilities to be included in Ignite
> > > codebase. Personally, I think that such information should be available
> > > from public API, without using of additional applications or diving
> into
> > > Ignite internals.
> > >
> > > What do you think?
> > >
> > > --
> > > Best regards,
> > > Ilya
> > >
> >
>

Re: Monitoring of active transactions

Posted by Anton Vinogradov <av...@apache.org>.
Ilya,

We extremely need this!

Txs and Locks info should be collected on each cluster hang.
We already have an issue related to this problem -
https://issues.apache.org/jira/browse/IGNITE-4937

Nikolay,

Good point,
but, seems you should start separate thread to discuss this.

On Fri, Sep 8, 2017 at 4:28 PM, Dmitry Pavlov <dp...@gmail.com> wrote:

> Hi Ilya,
>
> I'm definitely +1 for including the utility in the product. Perfect
> contribution.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:
>
> > Igniters,
> >
> > According to our current design and implementation, unclosed transaction
> or
> > unreleased lock can hang ignite cluster forever. This is logical, and
> with
> > correct usage of those mechanics such issue should never happen, in real
> > world developers can make mistakes and leave transaction open. We have a
> > feature "transaction timeout", but turns out it doesn't work in all cases
> > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all
> known
> > issues are fixed, there is still a lot of room for mistake and incorrect
> > usage.
> >
> > To make it possible for Ignite users to discover such problem and trace
> it
> > to a particular part of code, I've created a very simple utility that
> > collects and prints information about long running transactions for the
> > whole cluster. It is available here:
> > https://github.com/ilantukh/IgniteTxViewer.
> >
> > One might expect such monitoring utilities to be included in Ignite
> > codebase. Personally, I think that such information should be available
> > from public API, without using of additional applications or diving into
> > Ignite internals.
> >
> > What do you think?
> >
> > --
> > Best regards,
> > Ilya
> >
>

Re: Monitoring of active transactions

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi Ilya,

I'm definitely +1 for including the utility in the product. Perfect
contribution.

Sincerely,
Dmitriy Pavlov

пт, 8 сент. 2017 г. в 14:28, Ilya Lantukh <il...@gridgain.com>:

> Igniters,
>
> According to our current design and implementation, unclosed transaction or
> unreleased lock can hang ignite cluster forever. This is logical, and with
> correct usage of those mechanics such issue should never happen, in real
> world developers can make mistakes and leave transaction open. We have a
> feature "transaction timeout", but turns out it doesn't work in all cases
> (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all known
> issues are fixed, there is still a lot of room for mistake and incorrect
> usage.
>
> To make it possible for Ignite users to discover such problem and trace it
> to a particular part of code, I've created a very simple utility that
> collects and prints information about long running transactions for the
> whole cluster. It is available here:
> https://github.com/ilantukh/IgniteTxViewer.
>
> One might expect such monitoring utilities to be included in Ignite
> codebase. Personally, I think that such information should be available
> from public API, without using of additional applications or diving into
> Ignite internals.
>
> What do you think?
>
> --
> Best regards,
> Ilya
>