You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2018/01/17 01:48:00 UTC

[jira] [Updated] (KUDU-2148) 'kudu [master|tserver] status' may crash starting Kudu server

     [ https://issues.apache.org/jira/browse/KUDU-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adar Dembo updated KUDU-2148:
-----------------------------
    Code Review: http://gerrit.cloudera.org:8080/9041

> 'kudu [master|tserver] status' may crash starting Kudu server
> -------------------------------------------------------------
>
>                 Key: KUDU-2148
>                 URL: https://issues.apache.org/jira/browse/KUDU-2148
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0
>            Reporter: Alexey Serbin
>            Assignee: Adar Dembo
>            Priority: Critical
>              Labels: newbie
>
> When running the {{kudu}} CLI tool (particularly, {{kudu master status}} or {{kudu tserver status}}) against a master or a tablet server which is starting up, the starting process crashes when the RPC request arrives after the RPC server starts but before the embedded web server starts.
> Below is an example of tserver's log captured for the scenario described above:
> {noformat}
> I0919 05:20:12.843554 19694 tablet_server_main.cc:72] Tablet server non-default flags:
> --raft_heartbeat_interval_ms=50
> --fs_data_dirs=/var/lib/kudu/tserver
> --fs_wal_dir=/var/lib/kudu/tserver
> --tserver_master_addrs=127.0.0.1:7051
> --heap_profile_path=/tmp/kudu-tserver.19694
> --unlock_experimental_flags=true
> --unlock_unsafe_flags=true
> --flagfile=/etc/kudu/conf/tserver.gflagfile
> --log_dir=/var/log/kudu
> --logtostderr=true
> Tablet server version:
> kudu 1.6.0-SNAPSHOT
> revision 052d1def8c39a4e50f4957a95ae19764deb30938
> build type DEBUG
> built by jenkins at 19 Sep 2017 03:59:34 PST on my-node
> build id 2017-09-19_03-59-27
> I0919 05:20:12.844458 19694 minidump.cc:237] Setting minidump size limit to 20M
> I0919 05:20:12.916566 19694 tablet_server_main.cc:79] Initializing tablet server...
> I0919 05:20:12.969810 19694 system_ntp.cc:104] NTP initialized. Resolution in nanos?: 1 Skew PPM: 500 Current error: 47022
> I0919 05:20:12.969980 19694 server_base.cc:228] Could not load existing FS layout: Not found: /var/lib/kudu/tserver/instance: No such file or directory (error 2)
> I0919 05:20:12.969992 19694 server_base.cc:229] Creating new FS layout
> I0919 05:20:12.985256 19694 fs_manager.cc:474] Generated new instance metadata in path /var/lib/kudu/tserver/instance:
> uuid: "89b8ccc42dfb4d39a83d801ceb1ff0a1"
> format_stamp: "Formatted at 2017-09-19 12:20:12 on my-node"
> I0919 05:20:12.993276 19694 fs_manager.cc:431] Time spent creating directory manager: real 0.006s	user 0.000s	sys 0.000s
> I0919 05:20:12.993454 19694 env_posix.cc:1589] Not raising process file limit of 65536; it is already as high as it can go
> I0919 05:20:12.993530 19694 file_cache.cc:469] Constructed file cache lbm with capacity 26214
> I0919 05:20:12.995923 19694 fs_manager.cc:335] Time spent opening block manager: real 0.000s	user 0.000s	sys 0.000s
> I0919 05:20:12.995976 19694 fs_manager.cc:348] Opened local filesystem: /var/lib/kudu/tserver
> uuid: "89b8ccc42dfb4d39a83d801ceb1ff0a1"
> format_stamp: "Formatted at 2017-09-19 12:20:12 on my-node"
> I0919 05:20:12.996026 19694 fs_report.cc:347] Block manager report
> --------------------
> 1 data directories: /var/lib/kudu/tserver/data
> Total live blocks: 0
> Total live bytes: 0
> Total live bytes (after alignment): 0
> Total number of LBM containers: 0 (0 full)
> Did not check for missing blocks
> Did not check for orphaned blocks
> Total full LBM containers with extra space: 0 (0 repaired)
> Total full LBM container extra space in bytes: 0 (0 repaired)
> Total incomplete LBM containers: 0 (0 repaired)
> Total LBM partial records: 0 (0 repaired)
> I0919 05:20:13.107158 19694 ts_tablet_manager.cc:216] Loaded tablet metadata (0 live tablets)
> I0919 05:20:13.107344 19694 tablet_server_main.cc:84] Starting tablet server...
> I0919 05:20:13.149742 19694 rpc_server.cc:192] RPC server started. Bound to: 0.0.0.0:7050
> I0919 05:20:13.569461 19694 webserver.cc:149] Starting webserver on 0.0.0.0:8050
> I0919 05:20:13.569495 19694 webserver.cc:160] Document root disabled
> I0919 05:20:13.569829 19694 webserver.cc:282] Webserver started. Bound to: http://0.0.0.0:8050/
> F0919 05:20:13.158834 19831 server_base.cc:316] Check failed: _s.ok() Bad status: Illegal state: Not started
> *** Check failure stack trace: ***
> I0919 05:20:13.589279 19694 tablet_server_main.cc:87] Tablet server successfully started.
> I0919 05:20:13.570534 19842 process_memory.cc:181] Process hard memory limit is 0.464404 GB
> I0919 05:20:13.589357 19842 process_memory.cc:183] Process soft memory limit is 0.371523 GB
> I0919 05:20:13.589368 19842 process_memory.cc:186] Process memory pressure threshold is 0.278643 GB
> Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/1b23130b-f2d7-4724-7eb86aa5-72cfcc16.dmp
> Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/1b23130b-f2d7-4724-7eb86aa5-72cfcc16.dmp
> *** Aborted at 1505823613 (unix time) try "date -d @1505823613" if you are using GNU date ***
> PC: @     0x7ff5ea9b4067 gsignal
> *** SIGABRT (@0x3e700004cee) received by PID 19694 (TID 0x7ff5adfe0700) from PID 19694; stack trace: ***
>     @     0x7ff5ec6ce8d0 (unknown)
>     @     0x7ff5ea9b4067 gsignal
>     @     0x7ff5ea9b5448 abort
>     @          0x2626d1b kudu::AbortFailureFunction()
>     @          0x112f26d google::LogMessage::Fail()
> W0919 05:20:13.640193 19841 heartbeater.cc:524] Failed to heartbeat to 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: Client connection negotiation failed: client connection to 127.0.0.1:7051: connect: Connection refused (error 111)
>     @          0x113113d google::LogMessage::SendToLog()
>     @          0x112eda9 google::LogMessage::Flush()
>     @          0x1131bdf google::LogMessageFatal::~LogMessageFatal()
>     @          0x11d88ba kudu::server::ServerBase::GetStatusPB()
>     @          0x12034fc kudu::server::GenericServiceImpl::GetStatus()
>     @          0x121f062 _ZZN4kudu6server16GenericServiceIfC4ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE10_clESG_SH_SJ_
>     @          0x1221cf9 _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6server16GenericServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E10_E9_M_invokeERKSt9_Any_dataS4_S5_S9_
>     @          0x24a902a std::function<>::operator()()
>     @          0x24a888b kudu::rpc::GeneratedServiceIf::Handle()
>     @          0x24aa9b4 kudu::rpc::ServicePool::RunThread()
>     @          0x24aba2d boost::_mfi::mf0<>::operator()()
>     @          0x24ab85c boost::_bi::list1<>::operator()<>()
>     @          0x24ab751 boost::_bi::bind_t<>::operator()()
>     @          0x24ab687 boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @          0x24cb570 boost::function0<>::operator()()
>     @          0x265cb52 kudu::Thread::SuperviseThread()
>     @     0x7ff5ec6c70a4 start_thread
>     @     0x7ff5eaa6787d clone
>     @                0x0 (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)