You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brpc.apache.org by GitBox <gi...@apache.org> on 2022/01/10 03:29:51 UTC

[GitHub] [incubator-brpc] romiguan opened a new issue #1666: 使用h2:grpc时,server端服务(grpc)持续不可用,client端(brpc)陷入死循环

romiguan opened a new issue #1666:
URL: https://github.com/apache/incubator-brpc/issues/1666


   **Describe the bug (描述bug)**
   使用h2:grpc时,server端服务(grpc)持续不可用,client端(brpc)陷入死循环。grpc服务不可用时,client端出错日志如下:
   
   E0108 07:08:32.286566 111758 xxx_client.cc:166] call xxx server failed, Request to x.x.x.x:52618 failed: [E2001][11.18.42.196:52618][E112]xxx_server response :[E112]Not connected to x.x.x.x:8000 yet, server_id=xxxx [R1][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x [R2][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x [R3][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x
   
   当时已经没有流量了,但client端CPU一直在98%左右无法恢复,pstack输出如下:
   
   大量线程都卡住在这个地方,但实际已经没有任何流量了,多台机器都有这个问题。
   Thread 494 (Thread 0x7f6fa67c6700 (LWP 491633)):
   #0  0x0000000009aa5cc0 in load (__m=std::memory_order_acquire, this=0x7f8767b88080) at /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/../../../../include/c++/7/bits/atomic_base.h:396
   #1  steal (val=0x7f6fa67c2888, this=0x7f8767b88080) at external/brpc/src/bthread/work_stealing_queue.h:116
   #2  bthread::TaskControl::steal_task (this=0x7f9eef03f000, tid=tid@entry=0x7f6fa67c2888, seed=seed@entry=0x7f88a4714050, offset=<optimized out>) at external/brpc/src/bthread/task_control.cpp:347
   #3  0x0000000009a9db10 in steal_task (tid=0x7f6fa67c2888, this=0x7f88a4714000) at external/brpc/src/bthread/task_group.h:224
   #4  bthread::TaskGroup::wait_task (this=this@entry=0x7f88a4714000, tid=tid@entry=0x7f6fa67c2888) at external/brpc/src/bthread/task_group.cpp:123
   #5  0x0000000009aa3a6f in bthread::TaskGroup::run_main_task (this=this@entry=0x7f88a4714000) at external/brpc/src/bthread/task_group.cpp:150
   #6  0x0000000009aa702d in bthread::TaskControl::worker_thread (arg=0x7f9eef03f000) at external/brpc/src/bthread/task_control.cpp:73
   #7  0x00007f9fed8aadc5 in start_thread () from /lib64/libpthread.so.0
   #8  0x00007f9febe9aced in clone () from /lib64/libc.so.6
   
   Thread 549 (Thread 0x7f6fc1ffd700 (LWP 491578)):
   #0  0x0000000009aa5ca8 in bthread::TaskControl::steal_task (this=0x7f9eef03f000, tid=tid@entry=0x7f994d7f7cc8, seed=seed@entry=0x7f9fb3c0d1d0, offset=<optimized out>) at external/brpc/src/bthread/task_control.cpp:344
   #1  0x0000000009aa4266 in steal_task (tid=0x7f994d7f7cc8, this=0x7f9fb3c0d180) at external/brpc/src/bthread/task_group.h:224
   #2  bthread::TaskGroup::sched (pg=pg@entry=0x7f994d7f7d48) at external/brpc/src/bthread/task_group.cpp:590
   #3  0x0000000009aa43b0 in bthread::TaskGroup::usleep (pg=pg@entry=0x7f994d7f7d48, timeout_us=timeout_us@entry=100000) at external/brpc/src/bthread/task_group.cpp:827
   #4  0x0000000009a98b4c in bthread_usleep (microseconds=microseconds@entry=100000) at external/brpc/src/bthread/bthread.cpp:358
   #5  0x0000000009839bf0 in brpc::policy::XXXNamingService::RunNamingService (this=0x7f86af3f3f50, service_name=0x7f86aad8af18 "service_xxxx", actions=0x7f86ac6251e0) at external/brpc/src/brpc/policy/xxx_naming_service.cpp:111
   #6  0x00000000097cdbca in brpc::NamingServiceThread::Run (this=0x7f86ac625140) at external/brpc/src/brpc/details/naming_service_thread.cpp:365
   #7  0x00000000097cdcf9 in brpc::NamingServiceThread::RunThis (arg=<optimized out>) at external/brpc/src/brpc/details/naming_service_thread.cpp:268
   #8  0x0000000009aa3207 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at external/brpc/src/bthread/task_group.cpp:309
   #9  0x0000000009aba771 in bthread_make_fcontext ()
   #10 0x0000000000000000 in ?? ()
   
   
   **To Reproduce (复现方法)**
   
   
   **Expected behavior (期望行为)**
   
   
   **Versions (各种版本)**
   OS:  
   Compiler:
   brpc:
   protobuf:
   
   **Additional context/screenshots (更多上下文/截图)**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] lorinlee commented on issue #1666: 使用h2:grpc当server端服务(grpc)持续不可用时,client端(brpc)陷入死循环

Posted by GitBox <gi...@apache.org>.
lorinlee commented on issue #1666:
URL: https://github.com/apache/incubator-brpc/issues/1666#issuecomment-1019311914


   @romiguan 请问server恢复之后,client端有恢复吗?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] lorinlee edited a comment on issue #1666: 使用h2:grpc当server端服务(grpc)持续不可用时,client端(brpc)陷入死循环

Posted by GitBox <gi...@apache.org>.
lorinlee edited a comment on issue #1666:
URL: https://github.com/apache/incubator-brpc/issues/1666#issuecomment-1019311914


   @romiguan 
   
   请问server恢复之后,client端有恢复吗?
   CPU确定都是在steal_task吗,是否可以提供一下CPU profile的信息?
   机器有几个CPU核心,brpc线程数是多少呢?
   客户端的业务逻辑是否有重试,不断的在访问下游server呢


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] romiguan commented on issue #1666: 使用h2:grpc当server端服务(grpc)持续不可用时,client端(brpc)陷入死循环

Posted by GitBox <gi...@apache.org>.
romiguan commented on issue #1666:
URL: https://github.com/apache/incubator-brpc/issues/1666#issuecomment-1014087884


   哪位大神帮看下?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org