You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by bintisepaha <bi...@tudor.com> on 2016/08/24 02:02:17 UTC

IgniteCompute.broadcast() stuck

Hi We are on ignite 1.5.0-final and are recently facing this issue from the
client side broadcaasting a job to a random remote server node. how can we
avoid this and what is causing this? 

We can 20 parallel clients for load testing, and 15 completed with no
issues, 5 got stuck here.

code that calls the below and hangs from client side, it hangs at this line
	serverCompute.broadcast(new OrderHolderSaveRunnable(ignite,
orderHolderList));


	public Boolean processOrderHolders(List<OrderHolder> orderHolderList)
throws Exception {
		ClusterGroup serverGroup = ignite.cluster().forServers().forRandom();
		IgniteCompute serverCompute = ignite.compute(serverGroup);
		try {
			serverCompute.broadcast(new OrderHolderSaveRunnable(ignite,
orderHolderList));
		} catch(Exception e) {
			logger.error(e,e);
			throw e;
		}
		
		return true;
	}

Thread dump

Name: main
State: WAITING on
org.apache.ignite.internal.ComputeTaskInternalFuture@3fd1be52
Total blocked: 5  Total waited: 5,975

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
org.apache.ignite.internal.IgniteComputeImpl.broadcast(IgniteComputeImpl.java:250)
com.tudor.datagridI.client.TradeOrderStoreHelper.processOrderHolders(TradeOrderStoreHelper.java:37)
com.tudor.datagridI.TradingDataAccessImpl.saveOrders(TradingDataAccessImpl.java:399)
orderserver.client.GridClient.updateOrderHoldersInGrid(GridClient.java:138)
orderserver.Order.save(Order.java:3619)
   - locked orderserver.Order@732871ce
orderserver.Order.save(Order.java:3563)
   - locked orderserver.Order@732871ce
izi.izi_data_grid_ignite_test.OrderBooker.bookRegularOrder(OrderBooker.java:111)
izi.izi_data_grid_ignite_test.OrderBooker.bookOrder(OrderBooker.java:33)
izi.izi_data_grid_ignite_test.Main.bookOrders(Main.java:47)
izi.izi_data_grid_ignite_test.Main.runExc(Main.java:83)
izi.izi_data_grid_ignite_test.Main.run(Main.java:35)
izi.izi_data_grid_ignite_test.Runner.run(Runner.java:37)
izi.izi_data_grid_ignite_test.Runner.main(Runner.java:17)

Any help is greatly appreciated.




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Also, it was stuck like this for hours.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7472.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Val, this dump was from the client node which I sent on the original email. 
the zipped up dumps were from all the server nodes that participate in the
distributed cache.

anyways, changing it to run() fixed the issue. But we never understand the
root cause of the hanging, its always that the alternative suggestion works.
And we move on to it without understanding why what we tried first did not
work.

Thanks,
Binti



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7471.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by vkulichenko <va...@gmail.com>.
Hi Binti,

Such a dump only means that a client is executing a task and waiting for the
result. Is it executing longer than you expect? What is actually wrong from
your point of view?

Also there was no such thread in the dump files you provided. Please make
sure that you grab dumps and logs when the system is in the hanged state.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7421.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
It was hanging because all of our clients were stuck at the below

Thread dump 

Name: main 
State: WAITING on
org.apache.ignite.internal.ComputeTaskInternalFuture@3fd1be52 
Total blocked: 5  Total waited: 5,975 

Stack trace: 
sun.misc.Unsafe.park(Native Method) 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157) 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115) 
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112) 
org.apache.ignite.internal.IgniteComputeImpl.broadcast(IgniteComputeImpl.java:250) 
com.tudor.datagridI.client.TradeOrderStoreHelper.processOrderHolders(TradeOrderStoreHelper.java:37) 
com.tudor.datagridI.TradingDataAccessImpl.saveOrders(TradingDataAccessImpl.java:399) 
orderserver.client.GridClient.updateOrderHoldersInGrid(GridClient.java:138) 
orderserver.Order.save(Order.java:3619) 
   - locked orderserver.Order@732871ce 
orderserver.Order.save(Order.java:3563) 
   - locked orderserver.Order@732871ce 
izi.izi_data_grid_ignite_test.OrderBooker.bookRegularOrder(OrderBooker.java:111) 
izi.izi_data_grid_ignite_test.OrderBooker.bookOrder(OrderBooker.java:33) 
izi.izi_data_grid_ignite_test.Main.bookOrders(Main.java:47) 
izi.izi_data_grid_ignite_test.Main.runExc(Main.java:83) 
izi.izi_data_grid_ignite_test.Main.run(Main.java:35) 
izi.izi_data_grid_ignite_test.Runner.run(Runner.java:37) 
izi.izi_data_grid_ignite_test.Runner.main(Runner.java:17) 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7349.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by vkulichenko <va...@gmail.com>.
Binti,

These are just listener threads for JMX and they are not locking anything.
And as I said, I don't see any evidence of blocked compute Actually, I just
tried to search for 'Compute', 'broadcast' and other related words in dumps
and found nothing which means that nobody is executing any compute tasks.
That's why I'm asking why you think that it is not working.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7315.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Also, it looks like a lot of other nodes did not have these TCP threads.
Could that be the reason for this issue? server nodes not accepting
connections?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7314.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Vlad, 

Look at this one below in file 2511995.txt

"RMI TCP Connection(8)-10.10.11.100" #202 daemon prio=5 os_prio=0
tid=0x00007ff050009000 nid=0x2a7a23 in Object.wait() [0x00007fef5aef5000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at
com.sun.jmx.remote.internal.ArrayNotificationBuffer.fetchNotifications(ArrayNotificationBuffer.java:449)
	- locked <0x000000063066a628> (a
com.sun.jmx.remote.internal.ArrayNotificationBuffer)
	at
com.sun.jmx.remote.internal.ArrayNotificationBuffer$ShareBuffer.fetchNotifications(ArrayNotificationBuffer.java:227)
	at
com.sun.jmx.remote.internal.ServerNotifForwarder.fetchNotifs(ServerNotifForwarder.java:274)
	at
javax.management.remote.rmi.RMIConnectionImpl$4.run(RMIConnectionImpl.java:1273)
	at
javax.management.remote.rmi.RMIConnectionImpl$4.run(RMIConnectionImpl.java:1271)
	at
javax.management.remote.rmi.RMIConnectionImpl.fetchNotifications(RMIConnectionImpl.java:1277)
	at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
	at sun.rmi.transport.Transport$1.run(Transport.java:200)
	at sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
	at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
	at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$79(TCPTransport.java:683)
	at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$5/1005089754.run(Unknown
Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7313.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by Vladislav Pyatkov <vl...@gmail.com>.
Hello,

I do not see locks on RMI threads. I think, you gave wrong dump files.

Please, could you check files or provide the fragment from the dump where a
thread locked?

On Wed, Aug 24, 2016 at 11:53 PM, bintisepaha <bi...@tudor.com>
wrote:

> Did you see the dumps for RMI threads? We are seeing some RMI TCP
> Communication threads locked.
> Computation is stuck because clients are hanging in broadcast(), could this
> be related to the rmi tcp threads being locked.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7286.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Vladislav Pyatkov

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Did you see the dumps for RMI threads? We are seeing some RMI TCP
Communication threads locked.
Computation is stuck because clients are hanging in broadcast(), could this
be related to the rmi tcp threads being locked.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7286.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by vkulichenko <va...@gmail.com>.
Binti,

Broadcast method is broadcasting :) I.e., it sends a closure to all nodes.
For you case you should use run().

Dumps look empty, like nothing is actually executed. Why do you think that
the computation is stuck?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7285.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by bintisepaha <bi...@tudor.com>.
Val, if I selecting only one random node like below

                ClusterGroup serverGroup =
ignite.cluster().forServers().forRandom(); 
                IgniteCompute serverCompute = ignite.compute(serverGroup); 

The broadcast should only send the task to the pre-selected random node from
the cluster group? or am I misunderstanding this behavior?

Looks like in the above code, I can easily switch to using
run(IgniteRunnanle runnable) and that will run the job on one node. Not sure
how broadcast is different here, but i do not have to use broadcast. So if
you confirm run() is a better choice here, I will switch to run and give it
a try.

The volume te threadDumps.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/n7278/threadDumps.zip> 
sting we are doing is running 20 clients in parallel calling the above code
on a cluster of 16 nodes. not sure if that is causing it and we need to
scale up.

Attached the threadDumps from all 16 server nodes.

Thanks,
Binti



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7278.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IgniteCompute.broadcast() stuck

Posted by vkulichenko <va...@gmail.com>.
Hi Binti,

Can you please attach full thread dumps from all nodes?

Also I'm a bit confused by this:

bintisepaha wrote
> Hi We are on ignite 1.5.0-final and are recently facing this issue from
> the client side broadcaasting a job to a random remote server node.

Note that broadcasting implies execution of a closure on all available
server nodes, not on one random one. To execute only once, use run() or
call() method instead.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7257.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.