You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dubbo.apache.org by "zhangyz-hd (GitHub)" <gi...@apache.org> on 2019/01/17 06:27:45 UTC

[GitHub] [incubator-dubbo] zhangyz-hd opened issue #3262: "No provider available in XXXX" after server IO

- [ ] I have searched the [issues](https://github.com/apache/incubator-dubbo/issues) of this repository and believe that this is not a duplicate.
- [ ] I have checked the [FAQ](https://github.com/apache/incubator-dubbo/blob/master/FAQ.md) of this repository and believe that this is not a duplicate.

### Environment

* Dubbo version: xxx
* Operating System version: xxx
* Java version: xxx

### Steps to reproduce this issue

1. xxx
2. xxx
3. xxx

Pls. provide [GitHub address] to reproduce this issue.

### Expected Result

What do you expected from the above steps?

### Actual Result

What actually happens?

If there is an exception, please attach the exception trace:

```
Just put your stack trace here!
```


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao I would say that is a quick fix with an assumption that netty will close connection eventually. But what if it did not close connection as expected. So though we already lost connection, who cares waiting for several more seconds until connection fully closed? Am I Right?

Ps. better to wait for feature returning with a timeout in case dubbo waiting forever when netty has fatal error but not throwing exception or error to Dubbo.

Regards.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
> as @Kiddinglife suggests, will this workaround made org.apache.dubbo.remoting.transport.netty4.NettyChannel solve your problem, @zhangyz-hd ?
> 
> ```java
>     @Override
>     public void close() {
>         try {
>             super.close();
>         } catch (Exception e) {
>             logger.warn(e.getMessage(), e);
>         }
>         try {
>             removeChannelIfDisconnected(channel);
>         } catch (Exception e) {
>             logger.warn(e.getMessage(), e);
>         }
>         try {
>             attributes.clear();
>         } catch (Exception e) {
>             logger.warn(e.getMessage(), e);
>         }
>         try {
>             if (logger.isInfoEnabled()) {
>                 logger.info("Close netty channel " + channel);
>             }
>             ChannelFuture future = channel.close();
>             future.awaitUninterruptibly();
>         } catch (Exception e) {
>             logger.warn(e.getMessage(), e);
>         }
>     }
> ```

Should be able to solve this problem! !

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
` protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
I probably understand the problem you described. The reconnection method in org.apache.dubbo.remoting.transport.AbstractClient has actually double checked the isConnected() method, for example:
我大概明白你描述的问题了,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

In its link method, it is again judged isConnected(), because according to your description, in the NettyClient subclass NettyClient, because Netty4 is closed asynchronously, it may cause the judgment error to return directly, is it OK? Remove this judgment and achieve the desired result?
而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?

    protected void connect() throws RemotingException {
        connectLock.lock();
        try {
            // 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao I would say that is a quick fix with an assumption that netty will close connection eventually. But what if it did not close connection as expected. So though we already lost connection, who cares waiting for several more seconds until connection fully closed? Am I Right?

Ps. better to wait for feature returning with a timeout in case dubbo waiting forever when netty has fatal error.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] beiwei30 closed issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
[ issue closed by beiwei30 ]

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao I would say that is a quick fix with an assumption that netty will close connection eventually. But what if it did not close connection as expected. So though we already lost connection, who cares waiting for several more seconds until connection fully closed? Am I Right?

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] rangtao commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "rangtao (GitHub)" <gi...@apache.org>.
@Kiddinglife我觉得框架层应该屏蔽这些,connect方法里面放弃旧连接,创新新连接即可。connect方法前面不需要在判断isConnected,这样改动需要评估下。
protected void connect() throws RemotingException {
    connectLock.lock();
    try {
        // 去掉这里的判断?
        if (isConnected()) {
            return;
        }
      //省略
}


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }`


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
I revisit this issue today, and in fact, I believe this issue doesn't exist at all. I believe this issue only happens in the private branch the reporter uses. 

I figured out it's been fixed with PR #1249

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

`    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }`

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
as @Kiddinglife suggests, will this workaround made org.apache.dubbo.remoting.transport.netty4.NettyChannel solve your problem, @zhangyz-hd ?

```java
    @Override
    public void close() {
        try {
            super.close();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            removeChannelIfDisconnected(channel);
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            attributes.clear();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            if (logger.isInfoEnabled()) {
                logger.info("Close netty channel " + channel);
            }
            ChannelFuture future = channel.close();
            future.awaitUninterruptibly();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
    }
```

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] zhangyz-hd commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "zhangyz-hd (GitHub)" <gi...@apache.org>.
除了本次因为IO BUSY长达十几分钟造成所有NettyClient状态异常外,日常偶发的IO毛刺,也会造成个别的NettyClient异常,但此时因为还有其他的NettyClient可用,所以服务调用表现正常,只是在统计角度看可能出现服务提供方的调用次数不均衡。比如:
1,偶尔的Full GC的STW;
2,gcore生成dump时几分钟的高IO;

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

`    
@Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }
`

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
 @zhangyz-hd, would you mind to confirm if this will solve your problem?

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
> If there's no objection by tomorrow, I will close this issue soon.

good job!!!

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao we can dive into the impl of netty’s Close() method and see if it is a bug free fix with carefully considering all the exceptions the close() throws. More attentions should be paid on the case where the Dubbo thread is hang up forever. I think we will be fine...

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
## Version

dubbo-2.5.9。我们修改了Transporter的默认SPI,使用netty4:

```java
@SPI("netty4")
public interface Transporter{

}
```

## Issue

服务注册中心有服务的提供者信息,且提供者配置都正常,但服务消费者调用服务时,报"No provider available in XXXX"的RpcException。

## Analysis

### Step 1

通过gcore命令dump了jvm的内存,使用mat分析发现大量异常状态的连接,即com.alibaba.dubbo.remoting.transport.netty4.NettyClient中,closed=false但channel.ch.state=4。这就造成DubboInvoker在判断服务提供方是否可用时,都不可用。其判断顺序如下:

```java
/*
  com.alibaba.dubbo.rpc.cluster.support.AvailableCluster
*/
    public  Invoker join(Directory directory) throws RpcException {
        return new AbstractClusterInvoker(directory) {
            public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
                for (Invoker invoker : invokers) {
                    if (invoker.isAvailable()) {//ㄟ(▔▽▔)ㄏ
                        return invoker.invoke(invocation);
                    }
                }
                throw new RpcException("No provider available in " + invokers);
            }
        };
    }
```

```java
/*
  com.alibaba.dubbo.registry.integration.RegistryDirectory
*/
    public boolean isAvailable() {
        if (isDestroyed()) {
            return false;
        }
        Map> localUrlInvokerMap = urlInvokerMap;
        if (localUrlInvokerMap != null && localUrlInvokerMap.size() > 0) {
            for (Invoker invoker : new ArrayList>(localUrlInvokerMap.values())) {
                if (invoker.isAvailable()) {//ㄟ(▔▽▔)ㄏ
                    return true;
                }
            }
        }
        return false;
    }
```

```java
/*
  com.alibaba.dubbo.rpc.protocol.dubbo.DubboInvoker
*/
    public boolean isAvailable() {
        if (!super.isAvailable())
            return false;
        for (ExchangeClient client : clients) {
            if (client.isConnected() && !client.hasAttribute(Constants.CHANNEL_ATTRIBUTE_READONLY_KEY)) {//ㄟ(▔▽▔)ㄏ
                return true;
            }
        }
        return false;
    }
```

```java
/*
  com.alibaba.dubbo.remoting.transport.netty4.NettyChannel
*/
    public boolean isConnected() {
        return channel.isActive();//ㄟ(▔▽▔)ㄏ
    }
```

```java
/*
  io.netty.channel.socket.nio.NioSocketChannel
*/
    public boolean isActive() {
        java.nio.channels.SocketChannel ch = this.javaChannel();
        return ch.isOpen() && ch.isConnected();//ㄟ(▔▽▔)ㄏ
    }
```

```java
/*
  sun.nio.ch.SocketChannelImpl
*/
    public boolean isConnected() {
        synchronized(this.stateLock) {
            return this.state == 2;//ㄟ(▔▽▔)ㄏ
        }
    }
```

同时,这种异常连接因为closed=false也造成HeartBeatTask无法处理重连

```java
/*
  com.alibaba.dubbo.remoting.exchange.support.header.HeartBeatTask
*/
    public void run() {
        try {
            long now = System.currentTimeMillis();
            for (Channel channel : channelProvider.getChannels()) {
                if (channel.isClosed()) {//ㄟ(▔▽▔)ㄏ
                    continue;
                }
                //SKIPPPP
        } catch (Throwable t) {
            logger.warn("Unhandled exception when heartbeat, cause: " + t.getMessage(), t);
        }
    }
```

### Step 2

分析服务器的nmon文件,发现出现问题前时刻DISKBUSY=100%,CPU=100%,NETWORK≈0。

### Step 3

进一步分析Dubbo的心跳重连机制:

> 如果channel上读取时间超过3个heartbeat(默认1分钟)未更新,消费者会重新建立NettyClient底层连接,重建连接会先吊用disconnect()关闭已有连接,并通过connect()建立新连接。

```java
/*
  com.alibaba.dubbo.remoting.exchange.support.header.HeartBeatTask
*/
    if (lastRead != null && now - lastRead > heartbeatTimeout) {
        logger.warn("Close channel " + channel
                + ", because heartbeat read idle time out: " + heartbeatTimeout + "ms");
        if (channel instanceof Client) {
            try {
                ((Client) channel).reconnect();//ㄟ(▔▽▔)ㄏ
            } catch (Exception e) {
                //do nothing
            }
        } else {
            channel.close();
        }
    }
```

```java
/*
  com.alibaba.dubbo.remoting.transport.AbstractClient
*/
    public void reconnect() throws RemotingException {
        disconnect();//ㄟ(▔▽▔)ㄏ
        connect();
    }

    public void disconnect() {
        connectLock.lock();
        try {
            destroyConnectStatusCheckCommand();
            try {
                Channel channel = getChannel();
                if (channel != null) {
                    channel.close();//ㄟ(▔▽▔)ㄏ
                }
            } catch (Throwable e) {
                logger.warn(e.getMessage(), e);
            }
            try {
                doDisConnect();
            } catch (Throwable e) {
                logger.warn(e.getMessage(), e);
            }
        } finally {
            connectLock.unlock();
        }
    }
```

```java
/*
  com.alibaba.dubbo.remoting.transport.netty4.NettyChannel
*/
    public void close() {
        try {
            super.close();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            removeChannelIfDisconnected(channel);
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            attributes.clear();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            if (logger.isInfoEnabled()) {
                logger.info("Close netty channel " + channel);
            }
            channel.close();//ㄟ(▔▽▔)ㄏ
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
    }
```

查看Netty4的disconnect()处理,与Netty3有较大的区别:

> Netty3中close()返回Futrue对象前发送OUT事件,OUT事件将channel的state状态直接置为ST_CLOSED。

```java
/*
  org.jboss.netty.channel.Channels
*/
    public static ChannelFuture close(Channel channel) {
        ChannelFuture future = channel.getCloseFuture();
        channel.getPipeline().sendDownstream(new DownstreamChannelStateEvent(channel, future, ChannelState.OPEN, Boolean.FALSE));
        return future;
    }
```

> Netty4中将OUT事件放到IO线程处理,并立即返回Future对象,此时判断channnel状态依然为可用,即channel.isActive()返回true。

```java
/*
  io.netty.channel.AbstractChannelHandlerContext
*/
    public ChannelFuture close(final ChannelPromise promise) {
        if (!this.validatePromise(promise, false)) {
            return promise;
        } else {
            final AbstractChannelHandlerContext next = this.findContextOutbound();
            EventExecutor executor = next.executor();
            if (executor.inEventLoop()) {
                next.invokeClose(promise);
            } else {
                safeExecute(executor, new OneTimeTask() {
                    public void run() {
                        next.invokeClose(promise);
                    }
                }, promise, (Object)null);
            }
            return promise;
        }
    }
```

## 初步猜测

`AbstractClient.reconnect()`期望的是断开连接后立刻重建连接,这种模式在Netty3下,不会出现逻辑问题。但在Netty4下,采用异步处理的模式,很容易出现`AbstractClient.disconnect()`-->`AbstractClient.connect()`-->`run(){next.invokeClose(promise)}`的执行顺序,特别是在服务器IO出现短暂异常时。

我们已经通过简单代码来模拟出现closed=false但channel.ch.state=4的现象,并且可以在测试环境通过限制服务器IO的方式重现场景。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
Hi @manzhizhen,
how about retrieving the connection state from the returned future object so that the dubbo's thread will be hang up until netty4's io thread update connection state to false in the next eventloop.
regards.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
If there's no objection by tomorrow, I will close this issue soon.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

`
    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }
`
	
而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
Hi @manzhizhen and @beiwei30 ,
how about retrieving the connection state from the returned future object so that the dubbo's thread will be hang up until netty4's io thread update connection state to false in the next eventloop.
regards.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao we can dive into the impl of netty’s Close() method and see if it is a bug free fix with carefully considering all the exceptions the close() throws. We should be fine...

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao I would say that is a quick fix with an assumption that netty will close connection eventually. But what if it did not close connection as expected. So though we already lost connection, who cares waiting for several more seconds until connection fully closed? Right?

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] rangtao commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "rangtao (GitHub)" <gi...@apache.org>.
@Kiddinglife我觉得框架层应该屏蔽这些,connect方法里面放弃就连接,创新新连接即可。connect方法前面不需要在判断isConnected,这样改动需要评估下。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao I would say that is a quick fix with an assumption that netty will disclose connection eventually. But what if it did not close connection as expected. So though we already lost connection, who cares waiting for several more seconds until connection fully closed? Right?

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
` protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }`


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
 protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao we can dive into the impl of netty’s Close() method and see what exceptions it will throw. 

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] zhangyz-hd commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "zhangyz-hd (GitHub)" <gi...@apache.org>.
补充:
我们更进一步的代码验证已表明,程序只要进入`AbstractClient.reconnect()`,就有近乎100%的概率出现NettyClient的状态异常。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

` @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }`
	
而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
I am not persuaded to remove **if isConnected()** from `connect()` method

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
as @Kiddinglife suggests, will this workaround solve your problem, @zhangyz-hd ?

```java
    @Override
    public void close() {
        try {
            super.close();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            removeChannelIfDisconnected(channel);
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            attributes.clear();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        try {
            if (logger.isInfoEnabled()) {
                logger.info("Close netty channel " + channel);
            }
            ChannelFuture future = channel.close();
            future.awaitUninterruptibly();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
    }
```

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] beiwei30 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "beiwei30 (GitHub)" <gi...@apache.org>.
I revisit this issue today, and in fact, I believe this issue doesn't exist at all. I believe this issue only happens in the private branch the reporter uses. 

I figured out it's been fixed with PR #3262

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] subastion009 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "subastion009 (GitHub)" <gi...@apache.org>.
I’ll try get to the shop soon as I can do I gotta was a time to get hold on it to you get a chance and get it to me I wanna I want to know you have know you wanna is a time to get hold on it to me let ya ya buddy buddy sorry I wanna was meant to ya buddy buddy sorry I wanna I want to know you have know you wanna is a time to get hold on it to you get to ya know 

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

``
    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }
``
	
而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
``
protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
``

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
 protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }`


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] rangtao commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "rangtao (GitHub)" <gi...@apache.org>.
@beiwei30如果channel.close方法抛异常了,底层channel的状态没有改变,框架层也不会进行重连。我不确定什么场景下回发生。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] subastion009 commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "subastion009 (GitHub)" <gi...@apache.org>.
Hey I’ll try get to the shop soon as I can do I gotta was a time to get hold on it to you get a chance and get it to me I wanna I want to know you have know you wanna is a time to get hold on it to me let ya ya buddy buddy sorry I wanna was meant to ya buddy buddy sorry I wanna I want to know you have know you wanna is a time to get hold on it to you get to ya know 

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`protected` void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
I think that both changes should be needed, including @beiwei30 and below :

protected void connect() throws RemotingException {
connectLock.lock();
try {
// 去掉这里的判断?
if (isConnected()) {
return;
}
//省略
}

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
Hi @manzhizhen and @beiwei30 ,
how about retrieving the connection state from the returned future object so that the dubbo's thread will be hang up until netty4's io thread changes the connection state to false in next tick of its eventloop.
regards.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] manzhizhen commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "manzhizhen (GitHub)" <gi...@apache.org>.
我大概明白你描述的问题的,在org.apache.dubbo.remoting.transport.AbstractClient的重连方法实际对isConnected()方法已经做了双重检查了,例如:

    @Override
    public void reconnect() throws RemotingException {
        if (!isConnected()) {
            connectLock.lock();
            try {
                if (!isConnected()) {
                    disconnect();
                    connect();
                }
            } finally {
                connectLock.unlock();
            }
        }
    }

而在其链接方法中,再次判断了isConnected(),因为根据你的描述,在其Netty4实现的子类NettyClient中,由于Netty4的关闭是异步的,可能会导致判断失误直接返回,那是不是可以去掉这个判断,达到预期的效果?
	
`protected void connect() throws RemotingException {
        connectLock.lock();
        try {
			// 去掉这里的判断?
            if (isConnected()) {
                return;
            }
            initConnectStatusCheckCommand();
            doConnect();
            if (!isConnected()) {
                throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                        + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                        + ", cause: Connect wait timeout: " + getConnectTimeout() + "ms.");
            } else {
                if (logger.isInfoEnabled()) {
                    logger.info("Successed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                            + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                            + ", channel is " + this.getChannel());
                }
            }
            reconnect_count.set(0);
            reconnect_error_log_flag.set(false);
        } catch (RemotingException e) {
            throw e;
        } catch (Throwable e) {
            throw new RemotingException(this, "Failed connect to server " + getRemoteAddress() + " from " + getClass().getSimpleName() + " "
                    + NetUtils.getLocalHost() + " using dubbo version " + Version.getVersion()
                    + ", cause: " + e.getMessage(), e);
        } finally {
            connectLock.unlock();
        }
    }
`


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] Kiddinglife commented on issue #3262: "No provider available in XXXX" RpcException after server IO Busy

Posted by "Kiddinglife (GitHub)" <gi...@apache.org>.
@rangtao we can dive into the impl of netty’s Close() method and see if it is a bug free fix with carefully considering all the exceptions the close() throws. More attentions should be paid on the case where the Dubbo thread is hang up forever. 

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/3262 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org