You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Vivek Bhide <bh...@gmail.com> on 2017/07/06 03:57:47 UTC

How to address unclean undeploy exception

Hi,

In one of the operators, I have some big LRUcache objects (which are not
transient and hence checkpointed) and when that operator restarts for any
reason, I see the 'unclean undeploy' exception in container logs.

Unfortunately I don't have the stack trace with me but is there any
configuration that can be set to make sure that container undeploy waits
till the checkpointing is complete?

Also I am a bit curious on how the container undeploy and redeploy is
handled (triggered when any of the upstream operator restarts). I see that
the undeploy is often interrupted if its taking a bit more time. Is there
any documentation which I can refer to to understand this in a bit detail?

Regards
Vivek




--
View this message in context: http://apache-apex-users-list.78494.x6.nabble.com/How-to-address-unclean-undeploy-exception-tp1776.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Re: How to address unclean undeploy exception

Posted by Vivek Bhide <bh...@gmail.com>.
Below is the stacktrace. Also Can you please point me to some sample examples
for operator which are using managed state. Or general guidelines on how to
use it

Regards
Vivek

2017-07-06 18:02:05,006 ERROR engine.StreamingContainer
(StreamingContainer.java:run(1456)) - Operator set
[OperatorDeployInfo[id=7,name=usageCountCalculator,type=GENERIC,checkpoint={ffffffffffffffff,
0,
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=inputPort,streamId=sendToAccessCounter,sourceNodeId=6,sourcePortName=accessCountPort,locality=<null>,partitionMask=0,partitionKeys=<null>]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=sinkToHdfs,bufferServer=brdn2204.target.com]]]]
stopped running due to an exception.
com.datatorrent.netlet.NetletThrowable$NetletRuntimeException:
java.lang.UnsupportedOperationException: Client does not own the socket any
longer!
	at com.datatorrent.netlet.AbstractClient$1.offer(AbstractClient.java:343)
	at com.datatorrent.netlet.AbstractClient$1.offer(AbstractClient.java:333)
	at com.datatorrent.netlet.AbstractClient.send(AbstractClient.java:279)
	at
com.datatorrent.netlet.AbstractLengthPrependerClient.write(AbstractLengthPrependerClient.java:236)
	at
com.datatorrent.netlet.AbstractLengthPrependerClient.write(AbstractLengthPrependerClient.java:190)
	at
com.datatorrent.stram.stream.BufferServerPublisher.put(BufferServerPublisher.java:164)
	at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:469)
	at
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1428)
Caused by: java.lang.UnsupportedOperationException: Client does not own the
socket any longer!
	... 8 more
2017-07-06 18:02:05,020 WARN  ipc.Client (Client.java:call(1460)) -
interrupted waiting to send rpc request to server
java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1092)
	at org.apache.hadoop.ipc.Client.call(Client.java:1455)
	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
	at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
	at com.sun.proxy.$Proxy12.reportError(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
com.datatorrent.stram.RecoverableRpcProxy.invoke(RecoverableRpcProxy.java:157)
	at com.sun.proxy.$Proxy12.reportError(Unknown Source)
	at
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1459)
2017-07-06 18:02:05,021 WARN  stram.RecoverableRpcProxy
(RecoverableRpcProxy.java:invoke(168)) - RPC failure, will retry after 10000
ms (remaining 29998 ms)
java.io.IOException: java.lang.InterruptedException
	at org.apache.hadoop.ipc.Client.call(Client.java:1461)
	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
	at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
	at com.sun.proxy.$Proxy12.reportError(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
com.datatorrent.stram.RecoverableRpcProxy.invoke(RecoverableRpcProxy.java:157)
	at com.sun.proxy.$Proxy12.reportError(Unknown Source)
	at
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1459)
Caused by: java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1092)
	at org.apache.hadoop.ipc.Client.call(Client.java:1455)
	... 10 more
2017-07-06 18:02:05,022 WARN  engine.StreamingContainer
(StreamingContainer.java:teardownNode(1372)) - node 7/usageCountCalculator
took longer to exit, resulting in unclean undeploy!
2017-07-06 18:02:07,590 INFO  server.Server (Server.java:onMessage(599)) -
Received subscriber request: SubscribeRequestTuple{version=1.0,
identifier=tcp://brdn2204.target.com:40013/7.outputPort.1,
windowId=595ec0d8000000b3, type=sinkToHdfs/8.input,
upstreamIdentifier=7.outputPort.1, mask=0, partitions=null, bufferSize=1024}
2017-07-06 18:02:07,606 INFO  engine.StreamingContainer
(StreamingContainer.java:processHeartbeatResponse(825)) - Deploy request:
[OperatorDeployInfo[id=7,name=usageCountCalculator,type=GENERIC,checkpoint={595ec0d8000000b3,
0,
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=inputPort,streamId=sendToAccessCounter,sourceNodeId=6,sourcePortName=accessCountPort,locality=<null>,partitionMask=0,partitionKeys=<null>]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=sinkToHdfs,bufferServer=brdn2204.target.com]]]]
2017-07-06 18:02:08,058 INFO  server.Server (Server.java:onMessage(555)) -
Received publisher request: PublishRequestTuple{version=1.0,
identifier=7.outputPort.1, windowId=595ec0d8000000



--
View this message in context: http://apache-apex-users-list.78494.x6.nabble.com/How-to-address-unclean-undeploy-exception-tp1776p1778.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Re: How to address unclean undeploy exception

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Managed state operator is preferable to maintain large LRUCache.
On Wed, Jul 5, 2017 at 7:57 PM Vivek Bhide <bh...@gmail.com> wrote:

> Hi,
>
> In one of the operators, I have some big LRUcache objects (which are not
> transient and hence checkpointed) and when that operator restarts for any
> reason, I see the 'unclean undeploy' exception in container logs.
>
> Unfortunately I don't have the stack trace with me but is there any
> configuration that can be set to make sure that container undeploy waits
> till the checkpointing is complete?
>
> Also I am a bit curious on how the container undeploy and redeploy is
> handled (triggered when any of the upstream operator restarts). I see that
> the undeploy is often interrupted if its taking a bit more time. Is there
> any documentation which I can refer to to understand this in a bit detail?
>
> Regards
> Vivek
>
>
>
>
> --
> View this message in context:
> http://apache-apex-users-list.78494.x6.nabble.com/How-to-address-unclean-undeploy-exception-tp1776.html
> Sent from the Apache Apex Users list mailing list archive at Nabble.com.
>