You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2019/05/10 18:34:00 UTC

[jira] [Assigned] (KUDU-2819) SIGSEGV during kudu cluster rebalance

     [ https://issues.apache.org/jira/browse/KUDU-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin reassigned KUDU-2819:
-----------------------------------

    Assignee: Alexey Serbin

> SIGSEGV during kudu cluster rebalance
> -------------------------------------
>
>                 Key: KUDU-2819
>                 URL: https://issues.apache.org/jira/browse/KUDU-2819
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.9.0, 1.9.1
>            Reporter: Mitch Barnett
>            Assignee: Alexey Serbin
>            Priority: Major
>             Fix For: 1.9.1
>
>
> While utilizing the Kudu rebalancer utility, a SegFault is consistently occurring during run-time. 
> The following is seen on the client running the balancer command:
> {noformat}
> *** Aborted at 1556920300 (unix time) try "date -d @1556920300" if you are using GNU date ***
> PC: @ 0x2972aec tc_new
> *** SIGSEGV (@0x0) received by PID 62640 (TID 0x7f5f7191b980) from PID 0; stack trace: ***
>     @ 0x369b00f7e0 (unknown)
>     @ 0x2972aec tc_new
>     @ 0xc6a077 kudu::client::KuduClient::Data::GetTableSchema()
>     @ 0xc56e0d kudu::client::KuduClient::OpenTable()
>     @ 0xc38228 kudu::tools::RemoteKsckCluster::RetrieveTablesList()
>     @ 0xc2953a kudu::tools::KsckCluster::FetchTableAndTabletInfo()
>     @ 0xc217c4 kudu::tools::Ksck::FetchTableAndTabletInfo()
>     @ 0xdad2c1 kudu::tools::DoKsckForTablet()
>     @ 0xdaf244 kudu::tools::CheckCompleteMove()
>     @ 0xd84c18 kudu::tools::Rebalancer::AlgoBasedRunner::UpdateMovesInProgressStatus()
>     @ 0xd816f4 kudu::tools::Rebalancer::RunWith()
>     @ 0xd8dac6 kudu::tools::Rebalancer::Run()
>     @ 0xb34011 (unknown)
>     @ 0xb353a4 std::_Function_handler<>::_M_invoke()
>     @ 0x10b7eda kudu::tools::Action::Run()
>     @ 0xbb4f04 kudu::tools::DispatchCommand()
>     @ 0xbb56d3 kudu::tools::RunTool()
>     @ 0xad6778 main
>     @ 0x369ac1ed1d __libc_start_main
>     @ 0xb2ed7d (unknown)
> Segmentation fault (core dumped){noformat}
>  
> Generating the backtrace of the core dump gives us the following, occurring within gperftools:
> {noformat}
> #0 SLL_Next (t=0x59c18bbfeed6371)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/linked_list.h:45
> #1 SLL_TryPop (rv=<synthetic pointer>, list=0x58d4d60)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/linked_list.h:69
> #2 TryPop (rv=<synthetic pointer>, this=0x58d4d60)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:220
> #3 Allocate (oom_handler=0x29711c0 <tcmalloc::cpp_throw_oom(unsigned long)>, cl=9, size=128, this=<optimized out>)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:379
> #4 malloc_fast_path<tcmalloc::cpp_throw_oom> (size=<optimized out>)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1848
> #5 tc_new (size=<optimized out>) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1969
> #6 0x0000000000c6a077 in allocate (__n=1, this=<synthetic pointer>) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/ext/new_allocator.h:104
> #7 allocate (__a=<synthetic pointer>, __n=1) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/alloc_traits.h:357
> #8 __shared_count<kudu::Synchronizer::Data, std::allocator<kudu::Synchronizer::Data> > (__a=..., this=0x7fff13bcbde8)
> at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:616
> #9 __shared_ptr<std::allocator<kudu::Synchronizer::Data> > (__a=..., __tag=..., this=0x7fff13bcbde0)
> at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:1090
> #10 shared_ptr<std::allocator<kudu::Synchronizer::Data> > (__a=..., __tag=..., this=0x7fff13bcbde0)
> at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:316
> #11 allocate_shared<kudu::Synchronizer::Data, std::allocator<kudu::Synchronizer::Data> > (__a=...)
> at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:588
> #12 make_shared<kudu::Synchronizer::Data> () at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:604
> #13 Synchronizer (this=0x7fff13bcbde0) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/util/async_util.h:47
> #14 kudu::client::KuduClient::Data::GetTableSchema (this=<optimized out>, client=client@entry=0x11fe5440, table_name="impala::database.some_table",
> deadline=..., schema=schema@entry=0x7fff13bcc070, partition_schema=0x7fff13bcc0c0, table_id=0x7fff13bcc080, num_replicas=0x7fff13bcc068)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/client/client-internal.cc:441
> #15 0x0000000000c56e0d in kudu::client::KuduClient::OpenTable (this=0x11fe5440, table_name="impala::database.some_table", table=table@entry=0x7fff13bcc180)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/client/client.cc:513
> #16 0x0000000000c38228 in kudu::tools::RemoteKsckCluster::RetrieveTablesList (this=0x607d680)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/ksck_remote.cc:502
> #17 0x0000000000c2953a in kudu::tools::KsckCluster::FetchTableAndTabletInfo (this=0x607d680)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/ksck.h:408
> #18 0x0000000000c217c4 in kudu::tools::Ksck::FetchTableAndTabletInfo (this=this@entry=0x7fff13bcc510)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/ksck.cc:302
> ---Type <return> to continue, or q <return> to quit---
> #19 0x0000000000dad2c1 in kudu::tools::DoKsckForTablet (master_addresses=std::vector of length 3, capacity 3 = {...}, tablet_id="00229fcb55dc4a348e8caae7f7a3fc41")
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_replica_util.cc:624
> #20 0x0000000000daf244 in kudu::tools::CheckCompleteMove (master_addresses=std::vector of length 3, capacity 3 = {...},
> client=std::tr1::shared_ptr (count 1) 0x103345a0, tablet_id="00229fcb55dc4a348e8caae7f7a3fc41", from_ts_uuid="05d76878409e448fba542fade206dd15",
> to_ts_uuid="26d44b84ff3645d18f03b05a816e21eb", is_complete=0x7fff13bccb4f, completion_status=0x7fff13bccb50)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_replica_util.cc:319
> #21 0x0000000000d84c18 in kudu::tools::Rebalancer::AlgoBasedRunner::UpdateMovesInProgressStatus (this=0x7fff13bcd090, has_errors=0x7fff13bccd40,
> timed_out=0x7fff13bcccdf) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/rebalancer.cc:1173
> #22 0x0000000000d816f4 in kudu::tools::Rebalancer::RunWith (this=this@entry=0x7fff13bd2390, runner=runner@entry=0x7fff13bcd090,
> result_status=result_status@entry=0x7fff13bd20ec) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/rebalancer.cc:912
> #23 0x0000000000d8dac6 in kudu::tools::Rebalancer::Run (this=this@entry=0x7fff13bd2390, result_status=result_status@entry=0x7fff13bd20ec,
> moves_count=moves_count@entry=0x7fff13bd21c8) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/rebalancer.cc:203
> #24 0x0000000000b34011 in kudu::tools::(anonymous namespace)::RunRebalance (context=...)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_action_cluster.cc:319
> #25 0x0000000000b353a4 in std::_Function_handler<kudu::Status (kudu::tools::RunnerContext const&), kudu::Status (*)(kudu::tools::RunnerContext const&)>::_M_invoke(std::_Any_data const&, kudu::tools::RunnerContext const&) (__functor=..., __args#0=...) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
> #26 0x00000000010b7eda in operator() (__args#0=..., this=0x613a650) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
> Python Exception <class 'gdb.error'> There is no member or method named _M_element_count.:
> #27 kudu::tools::Action::Run (this=this@entry=0x613a630, chain=std::vector of length 2, capacity 2 = {...}, required_args=,
> variadic_args=std::vector of length 0, capacity 0)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_action.cc:258
> #28 0x0000000000bb4f04 in kudu::tools::DispatchCommand (chain=std::vector of length 2, capacity 2 = {...}, action=action@entry=0x613a630,
> remaining_args=std::deque with 1 elements = {...}) at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_main.cc:132
> #29 0x0000000000bb56d3 in kudu::tools::RunTool (argc=4, argv=0x7fff13bd2960, show_help=show_help@entry=false)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_main.cc:204
> #30 0x0000000000ad6778 in main (argc=4, argv=0x7fff13bd2960)
> at /container.redhat6/build/cdh/kudu/1.9.0-cdh6.2.0/rpm/BUILD/kudu-1.9.0-cdh6.2.0/src/kudu/tools/tool_main.cc:265{noformat}
>  
> I don't see an obvious memory mismanagement scenario, like a double-free or use after free.
>  I suspect there might either be corruption of memory at some point prior to this, or that there's a bug in tcmalloc itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)