You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Paul Brannan (Jira)" <ji...@apache.org> on 2020/04/24 15:29:00 UTC

[jira] [Comment Edited] (KUDU-3102) tabletserver coredump in jsonwriter

    [ https://issues.apache.org/jira/browse/KUDU-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091667#comment-17091667 ] 

Paul Brannan edited comment on KUDU-3102 at 4/24/20, 3:28 PM:
--------------------------------------------------------------

We just saw one of our tablet servers crash for the same reason:

 
*** Aborted at 1587733753 (unix time) try "date -d @1587733753" if you are using GNU date ***
PC: @           0x9edf55 (unknown)
*** SIGSEGV (@0x7fd0cd353308) received by PID 9284 (TID 0x7fd0cd34a700) from PID 18446744072857400072; stack trace: ***
    @     0x7fd1676ae390 (unknown)
    @           0x9edf55 (unknown)
    @           0x9ee6eb GetStackTrace()
    @           0x9e07ec (unknown)
    @           0x9e0b3c (unknown)
    @          0x22f40fb tcmalloc::allocate_full_cpp_throw_oom()
    @     0x7fd16624b59d std::__cxx11::basic_string<>::reserve()
    @     0x7fd1662401b5 std::__cxx11::basic_stringbuf<>::overflow()
    @     0x7fd166249d19 std::basic_streambuf<>::xsputn()
    @     0x7fd16623b303 std::ostream::write()
    @          0x209b8ac kudu::JsonWriterImpl<>::EndObject()
    @          0x209a83c kudu::JsonWriter::Protobuf()
    @          0x20c4ead kudu::Histogram::WriteAsJson()
    @          0x20c2f93 kudu::MetricEntity::WriteAsJson()
    @          0x20c34ca kudu::MetricRegistry::WriteAsJson()
    @           0xa6b6ae (unknown)
    @           0xa5da60 kudu::Webserver::RunPathHandler()
    @           0xa60463 kudu::Webserver::BeginRequestCallback()
    @           0xa96ffe (unknown)
    @           0xa99deb (unknown)
    @     0x7fd1676a46ba start_thread
    @     0x7fd16595141d clone
 

The machine has 128GB of memory, and the tablet server was using 35GB at the time.  There was 80GB of data in Linux disk cache which presumably was freeable by kswapd (though we have seen a bug a long time ago on an older kernel which prevented kswapd from reclaiming pages from disk cache).


was (Author: cout):
We just saw one of our tablet servers crash for the same reason:

 
*** Aborted at 1587733753 (unix time) try "date -d @1587733753" if you are using GNU date ***
PC: @           0x9edf55 (unknown)
*** SIGSEGV (@0x7fd0cd353308) received by PID 9284 (TID 0x7fd0cd34a700) from PID 18446744072857400072; stack trace: ***
    @     0x7fd1676ae390 (unknown)
    @           0x9edf55 (unknown)
    @           0x9ee6eb GetStackTrace()
    @           0x9e07ec (unknown)
    @           0x9e0b3c (unknown)
    @          0x22f40fb tcmalloc::allocate_full_cpp_throw_oom()
    @     0x7fd16624b59d std::__cxx11::basic_string<>::reserve()
    @     0x7fd1662401b5 std::__cxx11::basic_stringbuf<>::overflow()
    @     0x7fd166249d19 std::basic_streambuf<>::xsputn()
    @     0x7fd16623b303 std::ostream::write()
    @          0x209b8ac kudu::JsonWriterImpl<>::EndObject()
    @          0x209a83c kudu::JsonWriter::Protobuf()
    @          0x20c4ead kudu::Histogram::WriteAsJson()
    @          0x20c2f93 kudu::MetricEntity::WriteAsJson()
    @          0x20c34ca kudu::MetricRegistry::WriteAsJson()
    @           0xa6b6ae (unknown)
    @           0xa5da60 kudu::Webserver::RunPathHandler()
    @           0xa60463 kudu::Webserver::BeginRequestCallback()
    @           0xa96ffe (unknown)
    @           0xa99deb (unknown)
    @     0x7fd1676a46ba start_thread
    @     0x7fd16595141d clone

The machine has 128GB of memory, and the tablet server was using 35GB at the time.  There was 80GB of data in Linux disk cache which presumably was freeable by kswapd (though we have seen a bug a long time ago on an older kernel which prevented kswapd from reclaiming pages from disk cache).

> tabletserver coredump in jsonwriter
> -----------------------------------
>
>                 Key: KUDU-3102
>                 URL: https://issues.apache.org/jira/browse/KUDU-3102
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.10.1
>            Reporter: Yingchun Lai
>            Priority: Major
>
> A tserver coredump happened, backtrace like fowllowing:
> {code:java}
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Missing separate debuginfo for /home/work/app/kudu/c3tst-dev/master/package/libcrypto.so.10
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/35/93fa778645a59ea272dbbb59d318c60940e792.debug
> Core was generated by `/home/work/app/kudu/c3tst-dev/master/package/kudu_master -default_num_replicas='.
> Program terminated with signal 11, Segmentation fault.
> #0  GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328
> 328    /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h: No such file or directory.
> Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328
> #1  0x0000000000b9992b in GetStackTrace (result=result@entry=0x7fbf7232fa00, max_depth=max_depth@entry=31, skip_count=skip_count@entry=1) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295
> #2  0x0000000000b8c14d in DoSampledAllocation (size=size@entry=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169
> #3  0x000000000289f151 in do_malloc (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361
> #4  do_allocate_full<tcmalloc::cpp_throw_oom> (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751
> #5  tcmalloc::allocate_full_cpp_throw_oom (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765
> #6  0x000000000289f2a7 in dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=<optimized out>) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1774
> #7  malloc_fast_path<tcmalloc::cpp_throw_oom> (size=<optimized out>) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1845
> #8  tc_new (size=<optimized out>) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1969
> #9  0x00007fbf79c785cd in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve (this=this@entry=0x7fbf7232fbb0, __res=<optimized out>)
>     at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:293
> #10 0x00007fbf79c6be0b in std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow (this=0x7fbf72330668, __c=83) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/sstream.tcc:133
> #11 0x00007fbf79c76b89 in std::basic_streambuf<char, std::char_traits<char> >::xsputn (this=0x7fbf72330668,
>     __s=0x6929232 "Service_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104,\"percentile_99_99\":104,\"max\":104,\"total_sum\":104}"..., __n=250)
>     at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/streambuf.tcc:98
> #12 0x00007fbf79c66b62 in sputn (__n=250, __s=<optimized out>, this=<optimized out>) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/streambuf:451
> #13 _M_write (__n=250, __s=<optimized out>, this=0x7fbf72330660) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/ostream:313
> #14 std::ostream::write (this=0x7fbf72330660,
>     __s=0x6929200 ",{\"name\":\"handler_latency_kudu_consensus_ConsensusService_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104"..., __n=250)
>     at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:196
> #15 0x000000000265bb1c in Flush (this=0x58c75f8) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:307
> #16 kudu::JsonWriterImpl<rapidjson::Writer<kudu::UTF8StringStreamBuffer, rapidjson::UTF8<char>, rapidjson::UTF8<char>, rapidjson::CrtAllocator, 0u> >::EndObject (this=0x58c75f0) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:345
> #17 0x000000000265a968 in EndObject (this=0x7fbf72330190) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:151
> #18 kudu::JsonWriter::Protobuf (this=this@entry=0x7fbf72330190, pb=...) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:203
> #19 0x00000000026856ae in kudu::Histogram::WriteAsJson (this=0x54bc100, writer=0x7fbf72330190, opts=...) at /home/laiyingchun/kudu_xm/src/kudu/util/metrics.cc:885
> #20 0x0000000002685f03 in kudu::WriteMetricsToJson<std::unordered_map<kudu::MetricPrototype const*, scoped_refptr<kudu::Metric>, kudu::MetricPrototypeHash, kudu::MetricPrototypeEqualTo, std::allocator<std::pair<kudu::MetricPrototype const* const, scoped_refptr<kudu::Metric> > > > > (writer=writer@entry=0x7fbf72330190, metrics=..., opts=...) at /home/laiyingchun/kudu_xm/src/kudu/util/metrics.cc:64
> #21 0x0000000002685178 in WriteToJson (opts=..., merged_entity_metrics=..., writer=0x7fbf72330190) at /home/laiyingchun/kudu_xm/src/kudu/util/metrics.cc:86
> #22 kudu::MetricRegistry::WriteAsJson (this=this@entry=0x53f8400, writer=writer@entry=0x7fbf72330190, opts=...) at /home/laiyingchun/kudu_xm/src/kudu/util/metrics.cc:471
> #23 0x0000000000f92442 in kudu::WriteMetricsAsJson (metrics=0x53f8400, req=..., resp=0x7fbf72330510) at /home/laiyingchun/kudu_xm/src/kudu/server/default_path_handlers.cc:374
> #24 0x0000000000f82ba3 in operator() (a1=0x7fbf72330510, a0=..., this=0x5477338) at /home/laiyingchun/kudu_xm/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:770
> #25 kudu::Webserver::RunPathHandler (this=this@entry=0x53c2c80, handler=..., connection=connection@entry=0x5f19000, request_info=request_info@entry=0x5f19000) at /home/laiyingchun/kudu_xm/src/kudu/server/webserver.cc:637
> #26 0x0000000000f84c39 in kudu::Webserver::BeginRequestCallback (this=0x53c2c80, connection=0x5f19000, request_info=0x5f19000) at /home/laiyingchun/kudu_xm/src/kudu/server/webserver.cc:570
> #27 0x0000000000fbfee2 in handle_request (conn=conn@entry=0x5f19000) at /home/laiyingchun/kudu_xm/thirdparty/src/squeasel-36dc66b8723980feb32196a94b46734eb5eafbf9/squeasel.c:3901
> #28 0x0000000000fc28c0 in process_new_connection (conn=0x5f19000) at /home/laiyingchun/kudu_xm/thirdparty/src/squeasel-36dc66b8723980feb32196a94b46734eb5eafbf9/squeasel.c:4588
> #29 worker_thread (thread_func_param=0x555e580) at /home/laiyingchun/kudu_xm/thirdparty/src/squeasel-36dc66b8723980feb32196a94b46734eb5eafbf9/squeasel.c:4720
> #30 0x00007fbf7b111dc5 in start_thread () from /lib64/libpthread.so.0
> #31 0x00007fbf7937773d in clone () from /lib64/libc.so.6
> (gdb)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)