You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dubbo.apache.org by "beiwei30 (GitHub)" <gi...@apache.org> on 2019/01/17 06:39:07 UTC
[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No
provider available in XXXX" RpcException after server IO Busy
## Version
dubbo-2.5.9。我们修改了Transporter的默认SPI,使用netty4:
```java
@SPI("netty4")
public interface Transporter{
}
```
## Issue
服务注册中心有服务的提供者信息,且提供者配置都正常,但服务消费者调用服务时,报"No provider available in XXXX"的RpcException。
## Analysis
### Step 1
通过gcore命令dump了jvm的内存,使用mat分析发现大量异常状态的连接,即com.alibaba.dubbo.remoting.transport.netty4.NettyClient中,closed=false但channel.ch.state=4。这就造成DubboInvoker在判断服务提供方是否可用时,都不可用。其判断顺序如下:
```java
/*
com.alibaba.dubbo.rpc.cluster.support.AvailableCluster
*/
public Invoker join(Directory directory) throws RpcException {
return new AbstractClusterInvoker(directory) {
public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
for (Invoker invoker : invokers) {
if (invoker.isAvailable()) {//ㄟ(▔▽▔)ㄏ
return invoker.invoke(invocation);
}
}
throw new RpcException("No provider available in " + invokers);
}
};
}
```
```java
/*
com.alibaba.dubbo.registry.integration.RegistryDirectory
*/
public boolean isAvailable() {
if (isDestroyed()) {
return false;
}
Map> localUrlInvokerMap = urlInvokerMap;
if (localUrlInvokerMap != null && localUrlInvokerMap.size() > 0) {
for (Invoker invoker : new ArrayList>(localUrlInvokerMap.values())) {
if (invoker.isAvailable()) {//ㄟ(▔▽▔)ㄏ
return true;
}
}
}
return false;
}
```
```java
/*
com.alibaba.dubbo.rpc.protocol.dubbo.DubboInvoker
*/
public boolean isAvailable() {
if (!super.isAvailable())
return false;
for (ExchangeClient client : clients) {
if (client.isConnected() && !client.hasAttribute(Constants.CHANNEL_ATTRIBUTE_READONLY_KEY)) {//ㄟ(▔▽▔)ㄏ
return true;
}
}
return false;
}
```
```java
/*
com.alibaba.dubbo.remoting.transport.netty4.NettyChannel
*/
public boolean isConnected() {
return channel.isActive();//ㄟ(▔▽▔)ㄏ
}
```
```java
/*
io.netty.channel.socket.nio.NioSocketChannel
*/
public boolean isActive() {
java.nio.channels.SocketChannel ch = this.javaChannel();
return ch.isOpen() && ch.isConnected();//ㄟ(▔▽▔)ㄏ
}
```
```java
/*
sun.nio.ch.SocketChannelImpl
*/
public boolean isConnected() {
synchronized(this.stateLock) {
return this.state == 2;//ㄟ(▔▽▔)ㄏ
}
}
```
同时,这种异常连接因为closed=false也造成HeartBeatTask无法处理重连
```java
/*
com.alibaba.dubbo.remoting.exchange.support.header.HeartBeatTask
*/
public void run() {
try {
long now = System.currentTimeMillis();
for (Channel channel : channelProvider.getChannels()) {
if (channel.isClosed()) {//ㄟ(▔▽▔)ㄏ
continue;
}
//SKIPPPP
} catch (Throwable t) {
logger.warn("Unhandled exception when heartbeat, cause: " + t.getMessage(), t);
}
}
```
### Step 2
分析服务器的nmon文件,发现出现问题前时刻DISKBUSY=100%,CPU=100%,NETWORK≈0。
### Step 3
进一步分析Dubbo的心跳重连机制:
> 如果channel上读取时间超过3个heartbeat(默认1分钟)未更新,消费者会重新建立NettyClient底层连接,重建连接会先吊用disconnect()关闭已有连接,并通过connect()建立新连接。
```java
/*
com.alibaba.dubbo.remoting.exchange.support.header.HeartBeatTask
*/
if (lastRead != null && now - lastRead > heartbeatTimeout) {
logger.warn("Close channel " + channel
+ ", because heartbeat read idle time out: " + heartbeatTimeout + "ms");
if (channel instanceof Client) {
try {
((Client) channel).reconnect();//ㄟ(▔▽▔)ㄏ
} catch (Exception e) {
//do nothing
}
} else {
channel.close();
}
}
```
```java
/*
com.alibaba.dubbo.remoting.transport.AbstractClient
*/
public void reconnect() throws RemotingException {
disconnect();//ㄟ(▔▽▔)ㄏ
connect();
}
public void disconnect() {
connectLock.lock();
try {
destroyConnectStatusCheckCommand();
try {
Channel channel = getChannel();
if (channel != null) {
channel.close();//ㄟ(▔▽▔)ㄏ
}
} catch (Throwable e) {
logger.warn(e.getMessage(), e);
}
try {
doDisConnect();
} catch (Throwable e) {
logger.warn(e.getMessage(), e);
}
} finally {
connectLock.unlock();
}
}
```
```java
/*
com.alibaba.dubbo.remoting.transport.netty4.NettyChannel
*/
public void close() {
try {
super.close();
} catch (Exception e) {
logger.warn(e.getMessage(), e);
}
try {
removeChannelIfDisconnected(channel);
} catch (Exception e) {
logger.warn(e.getMessage(), e);
}
try {
attributes.clear();
} catch (Exception e) {
logger.warn(e.getMessage(), e);
}
try {
if (logger.isInfoEnabled()) {
logger.info("Close netty channel " + channel);
}
channel.close();//ㄟ(▔▽▔)ㄏ
} catch (Exception e) {
logger.warn(e.getMessage(), e);
}
}
```
查看Netty4的disconnect()处理,与Netty3有较大的区别:
> Netty3中close()返回Futrue对象前发送OUT事件,OUT事件将channel的state状态直接置为ST_CLOSED。
```java
/*
org.jboss.netty.channel.Channels
*/
public static ChannelFuture close(Channel channel) {
ChannelFuture future = channel.getCloseFuture();
channel.getPipeline().sendDownstream(new DownstreamChannelStateEvent(channel, future, ChannelState.OPEN, Boolean.FALSE));
return future;
}
```
> Netty4中将OUT事件放到IO线程处理,并立即返回Future对象,此时判断channnel状态依然为可用,即channel.isActive()返回true。
```java
/*
io.netty.channel.AbstractChannelHandlerContext
*/
public ChannelFuture close(final ChannelPromise promise) {
if (!this.validatePromise(promise, false)) {
return promise;
} else {
final AbstractChannelHandlerContext next = this.findContextOutbound();
EventExecutor executor = next.executor();
if (executor.inEventLoop()) {
next.invokeClose(promise);
} else {
safeExecute(executor, new OneTimeTask() {
public void run() {
next.invokeClose(promise);
}
}, promise, (Object)null);
}
return promise;
}
}
```
## 初步猜测
`AbstractClient.reconnect()`期望的是断开连接后立刻重建连接,这种模式在Netty3下,不会出现逻辑问题。但在Netty4下,采用异步处理的模式,很容易出现`AbstractClient.disconnect()`-->`AbstractClient.connect()`-->`run(){next.invokeClose(promise)}`的执行顺序,特别是在服务器IO出现短暂异常时。
我们已经通过简单代码来模拟出现closed=false但channel.ch.state=4的现象,并且可以在测试环境通过限制服务器IO的方式重现场景。
[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org