You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/05/04 01:12:12 UTC

[jira] [Resolved] (KUDU-1433) MaintenanceManager::GetMaintenanceManagerStatusDump can crash a server

     [ https://issues.apache.org/jira/browse/KUDU-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-1433.
-------------------------------
    Resolution: Fixed

Fixed in 2b86e94d992468e6ee92733662af5fc959da57e3

> MaintenanceManager::GetMaintenanceManagerStatusDump can crash a server
> ----------------------------------------------------------------------
>
>                 Key: KUDU-1433
>                 URL: https://issues.apache.org/jira/browse/KUDU-1433
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 0.5.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Blocker
>             Fix For: 0.9.0
>
>
> The tserver Andrew and I have been using for the hackathon crashed when we hit the /maintenance-manager URL. The crash:
> {noformat}
> F0429 19:18:42.514312 35122 maintenance_manager.h:54] Check failed: valid_
> *** Check failure stack trace: ***
>     @     0x7fab69c2cf4d  google::LogMessage::Fail()
>     @     0x7fab69c2ee4d  google::LogMessage::SendToLog()
>     @     0x7fab69c2ca89  google::LogMessage::Flush()
>     @     0x7fab69c2f8ef  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fab6f8b16e6  kudu::MaintenanceManager::GetMaintenanceManagerStatusDump()
>     @     0x7fab70d56f68  kudu::tserver::TabletServerPathHandlers::HandleMaintenanceManagerPage()
>     @     0x7fab70d57d34  boost::detail::function::void_function_obj_invoker2<>::invoke()
>     @     0x7fab6ffc1cfc  kudu::Webserver::RunPathHandler()
>     @     0x7fab6ffc2716  kudu::Webserver::BeginRequestCallback()
>     @     0x7fab6ffc28dc  kudu::Webserver::BeginRequestCallbackStatic()
>     @     0x7fab6ffce32e  handle_request
>     @     0x7fab6ffd0c2e  process_new_connection
>     @     0x7fab6ffd12cc  worker_thread
>     @     0x7fab6be98aa1  start_thread
>     @     0x7fab67e1593d  clone
>     @              (nil)  (unknown)
> {noformat}
> I suspect that we've got at least one op whose UpdateStats() method is not calling even one setter on the MaintenanceMgrStats object passed into it, or isn't writing cached previous stats into the passed-in object. LogGC, FlushDeltaMemStores, and FlushMRS are all culprits. There's nothing necessarily wrong with that (though it would be interesting to remember why we don't cache stats in these ops), so we need to fix GetMaintenanceManagerStatusDump to not access !valid_ stats objects.
> I think this was introduced about a year ago by commit 5e1f45e.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)