You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dubbo.apache.org by "TwiliMango (GitHub)" <gi...@apache.org> on 2018/11/11 09:36:48 UTC

[GitHub] [incubator-dubbo] TwiliMango opened issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

描述:1.Dubbo invoke timeout, Waiting server-side response timeout by scan timer,
           2.Dubbo invoke timeout, Waiting server-side response timeout,
          (出现过这两种异常信息,内容不一致,区别就是是否有scan timer)

环境:dubbo version is 2.1.0
          注册中心为zk集群,且为域名注册
          threadpool:fixed;threads:600;clinets:550左右

检查:1.dubbo telnet 查看当前threadpool active count基本为个位数
           2.应用无FGC,且GC正常
           3.没有打印线程耗尽的日志(推测线程数是够的)
           4.内网间调用,网络正常

问题:
       1.场景一 provider未收到请求,但consumer调用超时,方法基本几十ms执行完;
       2.场景二 provider瘦到请求,并成功返回,consumer端调用依旧超时。

尝试解决思路:1.修改线程池为cached,修改dispatcher为message
                          2.有说是dubbo monitor获取ip地址的时候导致超时?

想问下: 这种情况该如何分析并最终解决,真心Thanks.

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] carryxyh commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "carryxyh (GitHub)" <gi...@apache.org>.
这个问题先关闭了,如果有问题的话继续在下面讨论。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] carryxyh closed issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "carryxyh (GitHub)" <gi...@apache.org>.
[ issue closed by carryxyh ]

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org


[GitHub] [incubator-dubbo] TwiliMango commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "TwiliMango (GitHub)" <gi...@apache.org>.
> 1、建议贴出完整的异常堆栈跟踪信息,便于我们更准确地判断问题。从`Waiting server-side response timeout`异常信息,再结合源代码来看,肯定是服务端处理请求慢了,慢的原因要看服务端的日志。
> ![image](https://user-images.githubusercontent.com/1811851/48314961-c9d4df00-e60b-11e8-9bba-56c94874bbf4.png)
> 2、建议你们升级`dubbo`版本
> 3、这个问题是微服务依赖网络的复杂性引起问题排查困难,需要借助**分布式链路跟踪系统**协助快速判断问题所在。见集成ZipKin示例 [dubbo-samples-zipkin](https://github.com/dubbo/dubbo-samples/tree/master/dubbo-samples-zipkin)
> 
> 希望对你有用😁



> 1、建议贴出完整的异常堆栈跟踪信息,便于我们更准确地判断问题。从`Waiting server-side response timeout`异常信息,再结合源代码来看,肯定是服务端处理请求慢了,慢的原因要看服务端的日志。
> ![image](https://user-images.githubusercontent.com/1811851/48314961-c9d4df00-e60b-11e8-9bba-56c94874bbf4.png)
> 2、建议你们升级`dubbo`版本
> 3、这个问题是微服务依赖网络的复杂性引起问题排查困难,需要借助**分布式链路跟踪系统**协助快速判断问题所在。见集成ZipKin示例 [dubbo-samples-zipkin](https://github.com/dubbo/dubbo-samples/tree/master/dubbo-samples-zipkin)
> 
> 希望对你有用😁


您好,首先,感谢提供分析思路。

今天重新看了下,确实属于provider端超时(因个人原因,未check清楚,抱歉啦)。

在分析了超时方法后,逻辑只涉及一个SQL,然后查看druid monitor发现该sql最慢执行时间为18ms,执行时间分布基本在0-1ms。遂开始怀疑是获取连接慢问题,查看获取连接时最多等待多长时间为345。(check了GC日志,该时间段确实正常,哈哈。)

还有一点,这种超时情况属于偶发现象,基本一天一个或者几天才出现一次。

最后,继续请问下,这种情况该如何继续分析及避免呢,真心感谢。

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] edwardlee03 commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "edwardlee03 (GitHub)" <gi...@apache.org>.
> 在分析了超时方法后,逻辑只涉及一个SQL,然后查看druid monitor发现该sql最慢执行时间为18ms,执行时间分布基本在0-1ms。遂开始怀疑是获取连接慢问题,查看获取连接时最多等待多长时间为345。(check了GC日志,该时间段确实正常,哈哈。)
> 
> 还有一点,这种超时情况属于偶发现象,基本一天一个或者几天才出现一次。
> 
> 最后,继续请问下,这种情况该如何继续分析及避免呢,真心感谢。

对于偶发现象,我想到两个解法:
1、怀疑数据源连接池满了,肯定会打印类似"can not get connection"的日志,建议再仔细`grep Exception`确认一下。看看Druid内置的统计监控信息,有哪些有助于排查定位问题的。**每一项监控数据都有其价值所在,想想能给我们带来哪些线索。**
2、使用 [arthas](https://github.com/alibaba/arthas) 或 [greys-anatomy](https://github.com/oldmanpushcart/greys-anatomy) 诊断工具,看运气能否捕捉到。(毕玄大神以前经常使用`BTrace`这么干,他自己说的)


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] carryxyh commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "carryxyh (GitHub)" <gi...@apache.org>.
看看是不是请求增多,连接池维护的连接如果都被使用中,就会阻塞直到连接池中有连接。
如果是这个问题,把db的连接池连接数增大。
导致连接释放慢的一般是慢sql或者请求多,连接池的连接不够用。看你的情况应该不是慢sql,那可能是请求增多,因为你这边好像没有看到特别慢的sql。你可以看看网关的数据或者手动埋点看看那个时间段的请求是不是比平时多

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] carryxyh commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "carryxyh (GitHub)" <gi...@apache.org>.
看看是不是请求增多,连接池维护的连接如果都被使用中,就会阻塞直到连接池中有连接。
如果是这个问题,把db的连接池连接数增大。
导致连接释放慢的一般是慢sql或者请求多,连接池的连接不够用。看你的情况应该不是,那可能是请求增多,因为你这边好像没有看到特别慢的sql。你可以看看网关的数据或者手动埋点看看那个时间段的请求是不是比平时多

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] edwardlee03 commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "edwardlee03 (GitHub)" <gi...@apache.org>.
> 在分析了超时方法后,逻辑只涉及一个SQL,然后查看druid monitor发现该sql最慢执行时间为18ms,执行时间分布基本在0-1ms。遂开始怀疑是获取连接慢问题,查看获取连接时最多等待多长时间为345。(check了GC日志,该时间段确实正常,哈哈。)
> 
> 还有一点,这种超时情况属于偶发现象,基本一天一个或者几天才出现一次。
> 
> 最后,继续请问下,这种情况该如何继续分析及避免呢,真心感谢。

对于偶发现象,我想到两个解法:
1、怀疑数据源连接池满了,肯定会打印类似"can not get connection"的日志,建议再仔细`grep Exception`确认一下。看看Druid内置的统计监控信息,有哪些有助于排查定位问题的。**每一项监控数据都有其价值所在,想想能给我们带来哪些线索。**
2、使用 [arthas](https://github.com/alibaba/arthas) 或 [greys-anatomy](https://github.com/oldmanpushcart/greys-anatomy) 诊断工具,看运气能否捕捉到。


[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org

[GitHub] [incubator-dubbo] edwardlee03 commented on issue #2771: Dubbo invoke timeout, Waiting server-side response timeout by scan timer

Posted by "edwardlee03 (GitHub)" <gi...@apache.org>.
1、建议贴出完整的异常堆栈跟踪信息,便于我们更准确地判断问题。从`Waiting server-side response timeout`异常信息,再结合源代码来看,肯定是服务端处理请求慢了,慢的原因要看服务端的日志。
![image](https://user-images.githubusercontent.com/1811851/48314961-c9d4df00-e60b-11e8-9bba-56c94874bbf4.png)
2、建议你们升级`dubbo`版本
3、这个问题是微服务依赖网络的复杂性引起问题排查困难,需要借助**分布式链路跟踪系统**协助快速判断问题所在。见集成ZipKin示例 [dubbo-samples-zipkin](https://github.com/dubbo/dubbo-samples/tree/master/dubbo-samples-zipkin)

希望对你有用😁

[ Full content available at: https://github.com/apache/incubator-dubbo/issues/2771 ]
This message was relayed via gitbox.apache.org for notifications@dubbo.apache.org