You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dubbo.apache.org by "hjq2016 (GitHub)" <gi...@apache.org> on 2018/12/13 02:30:53 UTC

[GitHub] [incubator-dubbo] hjq2016 opened issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

- [ ] I have searched the [issues](https://github.com/apache/incubator-dubbo/issues) of this repository and believe that this is not a duplicate.
- [ ] I have checked the [FAQ](https://github.com/apache/incubator-dubbo/blob/master/FAQ.md) of this repository and believe that this is not a duplicate.

### Environment

* Dubbo version: 2.6.5
* Operating System version: CentOS6.9
* Java version: 1.8

### Steps to reproduce this issue

1. provider server set JVM argument -Ddubbo.service.shutdown.wait=50000
2. consumer call provider其中一个Service,需要30秒才能返回结果
3.after call 5秒后,执行 kill pid,将provider优雅停机
4.consumer收到register center发送的provider下线消息,主动关闭netty的channel连接
5.provider的DubboServerHandler线程30秒后想返回结果,发现channel关闭,抛出异常;
6.consumer端配置了60秒超时,得不到返回,抛出timeOut异常。导致请求异常结束

Pls. provide [GitHub address] to reproduce this issue.

### Expected Result

What do you expected from the above steps?
期待结果是Service跑完30秒,正常返回,然后优雅停机。consumer端的后续请求切换到可用Service上
### Actual Result

What actually happens?

If there is an exception, please attach the exception trace:

```
com.alibaba.dubbo.remoting.RemotingException: Failed to send message Response [id=213, version=2.0.2, status=20, event=false, error=null, result=RpcResult [返回的content], exception=null]] to /IP:65417, cause: null
	at com.alibaba.dubbo.remoting.transport.netty.NettyChannel.send(NettyChannel.java:110) ~[dubbo-2.6.5.jar:2.6.5]
	at com.alibaba.dubbo.remoting.transport.AbstractPeer.send(AbstractPeer.java:53) ~[dubbo-2.6.5.jar:2.6.5]
	at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:174) ~[dubbo-2.6.5.jar:2.6.5]
	at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:51) ~[dubbo-2.6.5.jar:2.6.5]
	at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:57) [dubbo-2.6.5.jar:2.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
Caused by: java.nio.channels.ClosedChannelException
	at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:643) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:370) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:137) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.Channels.write(Channels.java:632) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70) ~[netty-3.2.5.Final.jar:?]
	at com.alibaba.dubbo.remoting.transport.netty.NettyHandler.writeRequested(NettyHandler.java:98) ~[dubbo-2.6.5.jar:2.6.5]
	at org.jboss.netty.channel.Channels.write(Channels.java:611) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.Channels.write(Channels.java:578) ~[netty-3.2.5.Final.jar:?]
	at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251) ~[netty-3.2.5.Final.jar:?]
	at com.alibaba.dubbo.remoting.transport.netty.NettyChannel.send(NettyChannel.java:100) ~[dubbo-2.6.5.jar:2.6.5]
	... 7 more
```

最后发现问题,是consumer启动参数没有-Ddubbo.service.shutdown.wait=50000。当consumer端收到provider下线消息,主动触发一个优雅关闭链接,默认是10秒。加上参数后,就解决上面的问题。

但是provide的优雅关机居然需要comsumer配置,这可能是不合理的。所以看dubbo开发组怎么解决这个问题。
更多provider日志:
16:19:16.875 [DubboShutdownHook] INFO  com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol(432) -  [DUBBO] Close dubbo server: /127.0.0.1:20881, dubbo version: 2.6.5, current host: 127.0.0.1

16:19:26.619 [New I/O server worker #1-1] WARN  com.alibaba.dubbo.remoting.transport.AbstractServer(205) -  [DUBBO] All clients has discontected from /127.0.0.1:20881. You can graceful shutdown now., dubbo version: 2.6.5, current host: 127.0.0.1

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] hjq2016 commented on issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

Posted by "hjq2016 (GitHub)" <gi...@apache.org>.
kill pid 是开始优雅停机。等待正在跑的线程结束。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] hjq2016 commented on issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

Posted by "hjq2016 (GitHub)" <gi...@apache.org>.
kill pid 是开始优雅停机。等待正在跑的线程结束。这个例子。要等待最多30秒,优雅停机才会结束。
优雅停机的第一步是取消监控中心的注册,第二步是等待线程结束。所以consumer接到provider下线的消息,默认10秒后主动优雅关闭与provider的链接。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] tswstarplanet commented on issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

Posted by "tswstarplanet (GitHub)" <gi...@apache.org>.
不好意思,这块我不太熟悉,再请教一下。kill pid是直接杀进程吧?杀进程来优雅停机?是通过线程钩子实现的吗?

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] wy545 commented on issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

Posted by "wy545 (GitHub)" <gi...@apache.org>.
kill pid只是发一个通知,不会强杀进程的,dubbo的优雅停机也是依据钩子实现的。这个问题确实回出现的。。。不知道开发者大神们怎么解决,还是说只能业务上进行控制了

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] tswstarplanet commented on issue #2952: provider优雅停机,call需要30秒,consumer10秒后主动中断channel,导致优雅停机失败

Posted by "tswstarplanet (GitHub)" <gi...@apache.org>.
不好意思,有个地方看不懂。请问既然kill pid之后provider优雅停机了,那为什么30秒之后provider又会返回结果给客户端呢?能解答一下吗

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2952 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org