You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2020/10/06 17:57:00 UTC

[jira] [Updated] (KUDU-2987) Intra location rebalance will crash in special case

     [ https://issues.apache.org/jira/browse/KUDU-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin updated KUDU-2987:
--------------------------------
    Affects Version/s: 1.9.0
                       1.10.0
                       1.10.1

> Intra location rebalance will crash in special case
> ---------------------------------------------------
>
>                 Key: KUDU-2987
>                 URL: https://issues.apache.org/jira/browse/KUDU-2987
>             Project: Kudu
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.9.0, 1.10.0, 1.10.1, 1.11.0
>            Reporter: ZhangYao
>            Assignee: ZhangYao
>            Priority: Major
>             Fix For: 1.12.0, 1.11.1
>
>
> Recently I am doing POC about rebalance and I get core when running intra location rebalance.
> Here is the log:
> {code:java}
> I2019-10-30 20:02:17.843044 40915 rebalancer_tool.cc:225] running rebalancer within location '/location/2044'
> F2019-10-30 20:02:17.884591 40915 map-util.h:109] Check failed: it != collection.end() Map key not found: a9119004b2d24f42a1acf09d142565fb
> *** Check failure stack trace: ***
>     @          0x111a75d  google::LogMessage::Fail()
>     @          0x111c6d3  google::LogMessage::SendToLog()
>     @          0x111a2b9  google::LogMessage::Flush()
>     @          0x111d0ef  google::LogMessageFatal::~LogMessageFatal()
>     @           0xe26da7  FindOrDie<>()
>     @           0xe1f204  kudu::tools::RebalancerTool::AlgoBasedRunner::GetNextMovesImpl()
>     @           0xe162e0  kudu::tools::RebalancerTool::BaseRunner::GetNextMoves()
>     @           0xe15bf5  kudu::tools::RebalancerTool::RunWith()
>     @           0xe1db0e  kudu::tools::RebalancerTool::Run()
>     @           0xb6fea1  kudu::tools::(anonymous namespace)::RunRebalance()
>     @           0xb70e14  std::_Function_handler<>::_M_invoke()
>     @          0x11714a2  kudu::tools::Action::Run()
>     @           0xc00587  kudu::tools::DispatchCommand()
>     @           0xc00f4b  kudu::tools::RunTool()
>     @           0xb0fd6d  main
>     @     0x7f37086a4b15  __libc_start_main
>     @           0xb6b399  (unknown)
> {code}
> I found it may be the problem in {{RebalancerTool::AlgoBasedRunner::GetNextMovesImpl}} when building extra_info_by_tablet_id, it check that the table id in tablet must occur in table info. But when we build ClusterRawInfo in {{RebalancerTool::KsckResultsToClusterRawInfo}} we only collect the table occurs in location but all tablets in cluster. 
>  This problem will occur when the location doesn't have replica for all table. When location is far more than table's replica it will happen.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)