You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brpc.apache.org by GitBox <gi...@apache.org> on 2020/03/18 05:04:23 UTC

[GitHub] [incubator-brpc] dyike opened a new issue #1066: metrics fork出的子进程夯住

dyike opened a new issue #1066: metrics fork出的子进程夯住
URL: https://github.com/apache/incubator-brpc/issues/1066
 
 
   **Describe the bug (描述bug)**
   
   在Debian 10的机器,任何brpc项目,请求http://ip:port/brpc_metrics 接口,首次请求会fork一个进程,子进程会夯死(看似死锁了)。手动kill掉fork出来的子进程,主进程不会退出,此时再次请求接口一切正常。
   
   更多debug信息,附在**Additional context/screenshots (更多上下文/截图)**
   
   在Debian 8 是正常的,猜测可能跟libc的版本有关
   
   **To Reproduce (复现方法)**
   
   编译:brpc项目example 任何一个项目,启动服务后,请求http://ip:port/brpc_metrics 接口
   
   **Expected behavior (期望行为)**
   不出现子进程夯死
   
   **Versions (各种版本)**
   OS: Debian(10) 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
   Compiler: g++ (Debian 8.3.0-6) 8.3.0
   brpc: 最新master分支
   protobuf: .4.0
   
   **Additional context/screenshots (更多上下文/截图)**
   1. 启动echo_server
   2. 此时只有一个进程
   
   ![2](https://user-images.githubusercontent.com/4496772/76926942-a36a3680-6918-11ea-979f-c5b06b091423.png)
   
   
   3. curl "http://127.1:8000/brcp_metrics" 
   
   4. 多了fork出来的子进程
   ![2](https://user-images.githubusercontent.com/4496772/76926969-ba108d80-6918-11ea-89ae-3f28904878d5.png)
   
   5. 查看子进程的堆栈信息
   ```
       #0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63
       #1  0x00007f5765ac78de in __GI___register_atfork (prepare=0x7f5765b7f940 <atfork_lock>, prepare@entry=0x0, parent=parent@entry=0x0,
           child=child@entry=0x561082bedef0 <bvar::detail::SamplerCollector::child_callback_atfork()>,
           dso_handle=0x7f5765ac771b <__lll_lock_wait_private+27>) at register-atfork.c:40
       #2  0x00007f5765f0749c in __pthread_atfork (prepare=prepare@entry=0x0, parent=parent@entry=0x0,
           child=child@entry=0x561082bedef0 <bvar::detail::SamplerCollector::child_callback_atfork()>) at pthread_atfork.c:51
       #3  0x0000561082bedf77 in bvar::detail::SamplerCollector::create_sampling_thread (this=0x561083c1c9b0) at src/bvar/detail/sampler.cpp:91
       #4  bvar::detail::SamplerCollector::after_forked_as_child (this=0x561083c1c9b0) at src/bvar/detail/sampler.cpp:97
       #5  bvar::detail::SamplerCollector::child_callback_atfork () at src/bvar/detail/sampler.cpp:82
       #6  0x00007f5765ac7c80 in __run_fork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=true) at register-atfork.c:135
       #7  0x00007f5765a87868 in __libc_fork () at ../sysdeps/nptl/fork.c:137
       #8  0x00007f5765a32628 in _IO_new_proc_open (fp=fp@entry=0x7f575002dfc0, command=command@entry=0x561082ea3e66 "uname -ap", mode=<optimized out>,
           mode@entry=0x561082d16eab "r") at iopopen.c:122
       #9  0x00007f5765a328c8 in _IO_new_popen (command=0x561082ea3e66 "uname -ap", mode=mode@entry=0x561082d16eab "r") at iopopen.c:203
       #10 0x0000561082bd20b4 in butil::read_command_output_through_popen (os=..., cmd=<optimized out>) at src/butil/popen.cpp:159
       #11 0x0000561082be5957 in bvar::ReadVersion::ReadVersion (this=0x7f575002df60) at src/bvar/default_variables.cpp:608
       #12 0x0000561082be5c0c in butil::GetLeakySingleton<bvar::ReadVersion>::create_leaky_singleton ()
           at ./src/butil/memory/singleton_on_pthread_once.h:41
       #13 0x00007f57664b9997 in __pthread_once_slow (
           once_control=0x5610831cd560 <butil::GetLeakySingleton<bvar::ReadVersion>::g_create_leaky_singleton_once>,
           init_routine=0x561082be5bf0 <butil::GetLeakySingleton<bvar::ReadVersion>::create_leaky_singleton()>) at pthread_once.c:116
       #14 0x00007f57664b9a45 in __GI___pthread_once (
           once_control=once_control@entry=0x5610831cd560 <butil::GetLeakySingleton<bvar::ReadVersion>::g_create_leaky_singleton_once>,
           init_routine=init_routine@entry=0x561082be5bf0 <butil::GetLeakySingleton<bvar::ReadVersion>::create_leaky_singleton()>) at pthread_once.c:143
       #15 0x0000561082be046c in butil::get_leaky_singleton<bvar::ReadVersion> () at ./src/butil/atomicops_internals_x86_gcc.h:211
       #16 bvar::get_kernel_version (os=...) at src/bvar/default_variables.cpp:616
       #17 0x0000561082ade092 in bvar::PassiveStatus<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::describe (
           quote_string=true, os=..., this=0x5610831cc220 <bvar::g_kernel_version[abi:cxx11]>) at ./src/bvar/passive_status.h:226
       #18 bvar::PassiveStatus<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::describe (
           this=0x5610831cc220 <bvar::g_kernel_version[abi:cxx11]>, os=..., quote_string=<optimized out>) at ./src/bvar/passive_status.h:222
       #19 0x0000561082bd3453 in bvar::Variable::describe_exposed (name="kernel_version", os=..., quote_string=<optimized out>,
           display_filter=bvar::DISPLAY_ON_PLAIN_TEXT) at src/bvar/variable.cpp:257
       #20 0x0000561082bd5ce6 in bvar::Variable::dump_exposed (dumper=dumper@entry=0x7f575c6f79f0, poptions=poptions@entry=0x0)
       --Type <RET> for more, q to quit, c to continue without paging--
           at src/bvar/variable.cpp:514
       #21 0x0000561082b1d883 in brpc::DumpPrometheusMetricsToIOBuf (output=0x7f575002bbb8) at src/brpc/builtin/prometheus_metrics_service.cpp:198
       #22 0x0000561082b1daa4 in brpc::PrometheusMetricsService::default_method (this=<optimized out>, cntl_base=0x7f575002b9d0, done=0x7f575002c500)
           at ./src/brpc/controller.h:377
       #23 0x0000561082c3ea59 in brpc::brpc_metrics::CallMethod (method=<optimized out>, done=<optimized out>, response=<optimized out>,
           request=<optimized out>, controller=<optimized out>, this=<optimized out>) at src/brpc/builtin_service.pb.cc:10006
       #24 brpc::brpc_metrics::CallMethod (this=<optimized out>, method=<optimized out>, controller=<optimized out>, request=<optimized out>,
           response=<optimized out>, done=<optimized out>) at src/brpc/builtin_service.pb.cc:9998
       #25 0x0000561082b721e1 in brpc::policy::ProcessHttpRequest (msg=<optimized out>) at src/brpc/policy/http_rpc_protocol.cpp:1484
       #26 0x0000561082c9ed97 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f575002a580) at src/brpc/input_messenger.cpp:135
       #27 0x0000561082c9fbf2 in brpc::RunLastMessage::operator() (this=<synthetic pointer>, last_msg=0x7f575002a580)
           at src/brpc/input_messenger.cpp:141
       #28 std::unique_ptr<brpc::InputMessageBase, brpc::RunLastMessage>::~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
           at /usr/include/c++/8/bits/unique_ptr.h:274
       
       #29 brpc::InputMessenger::OnNewMessages (m=0x7f574801aee0) at /usr/include/c++/8/bits/unique_ptr.h:270
       #30 0x0000561082c8516d in brpc::Socket::ProcessEvent (arg=0x7f574801aee0) at src/brpc/socket.cpp:1017
       #31 0x0000561082bfe67f in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at src/bthread/task_group.cpp:296
       #32 0x0000561082c02311 in bthread_make_fcontext () at /usr/include/google/protobuf/repeated_field.h:642
       #33 0x0000000000000000 in ?? ()
   ```
   看出大概的流程:获取kernel version的时候,fork出子进程,调用了bvar/detail/sampler.cpp中的child_callback_atfork() → after_forked_as_child() → create_sampling_thread()
   
   问题大概出在了pthread_atfork()这里面,研究了一番pthread_atfork里面的实现有把锁.....
   
   6. 尝试debug
   
   编译后,正常。看似能解决,但对这里面的机制还不是很清楚
   
   ![2](https://user-images.githubusercontent.com/4496772/76926996-cd235d80-6918-11ea-8528-07eb5c859f1f.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] zyearn edited a comment on issue #1066: metrics fork出的子进程夯住

Posted by GitBox <gi...@apache.org>.
zyearn edited a comment on issue #1066: metrics fork出的子进程夯住
URL: https://github.com/apache/incubator-brpc/issues/1066#issuecomment-600977136
 
 
   原因:在fork中会调之前由pthread_atfork注册的fork_handler,如果在fork_handler还调了pthread_atfork,由于glibc2.28之后的fork会和pthread_atfork抢一把锁,于是发生死锁。
   FIX:https://github.com/apache/incubator-brpc/commit/2f8fc37d52c2a02ee6f348aaa52c7ded4a4844c3

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] zyearn commented on issue #1066: metrics fork出的子进程夯住

Posted by GitBox <gi...@apache.org>.
zyearn commented on issue #1066: metrics fork出的子进程夯住
URL: https://github.com/apache/incubator-brpc/issues/1066#issuecomment-600977136
 
 
   原因:在fork中会调之前由pthread_atfork注册的fork_handler,如果在fork_handler还调了pthread_atfork,由于2.28之后的fork会和pthread_atfork抢一把锁,于是发生死锁。
   FIX:https://github.com/apache/incubator-brpc/commit/2f8fc37d52c2a02ee6f348aaa52c7ded4a4844c3

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] zyearn closed issue #1066: metrics fork出的子进程夯住

Posted by GitBox <gi...@apache.org>.
zyearn closed issue #1066: metrics fork出的子进程夯住
URL: https://github.com/apache/incubator-brpc/issues/1066
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org