You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2017/09/19 20:42:00 UTC

[jira] [Created] (KUDU-2148) 'kudu [master|tserver] status' may crash starting Kudu server

Alexey Serbin created KUDU-2148:
-----------------------------------

             Summary: 'kudu [master|tserver] status' may crash starting Kudu server
                 Key: KUDU-2148
                 URL: https://issues.apache.org/jira/browse/KUDU-2148
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.5.0
            Reporter: Alexey Serbin


When running the {{kudu}} CLI tool (particularly, {{kudu master status}} or {{kudu tserver status}}) against a master or a tablet server which is starting up, the staring process crashes when the RPC request arrives after the RPC server starts but before the embedded web server starts.

Below is an example of tserver's log from captured for the scenario described above:

{noformat}
I0919 05:20:12.843554 19694 tablet_server_main.cc:72] Tablet server non-default flags:
--raft_heartbeat_interval_ms=50
--fs_data_dirs=/var/lib/kudu/tserver
--fs_wal_dir=/var/lib/kudu/tserver
--tserver_master_addrs=127.0.0.1:7051
--heap_profile_path=/tmp/kudu-tserver.19694
--unlock_experimental_flags=true
--unlock_unsafe_flags=true
--flagfile=/etc/kudu/conf/tserver.gflagfile
--log_dir=/var/log/kudu
--logtostderr=true
Tablet server version:
kudu 1.6.0-SNAPSHOT
revision 052d1def8c39a4e50f4957a95ae19764deb30938
build type DEBUG
built by jenkins at 19 Sep 2017 03:59:34 PST on my-node
build id 2017-09-19_03-59-27
I0919 05:20:12.844458 19694 minidump.cc:237] Setting minidump size limit to 20M
I0919 05:20:12.916566 19694 tablet_server_main.cc:79] Initializing tablet server...
I0919 05:20:12.969810 19694 system_ntp.cc:104] NTP initialized. Resolution in nanos?: 1 Skew PPM: 500 Current error: 47022
I0919 05:20:12.969980 19694 server_base.cc:228] Could not load existing FS layout: Not found: /var/lib/kudu/tserver/instance: No such file or directory (error 2)
I0919 05:20:12.969992 19694 server_base.cc:229] Creating new FS layout
I0919 05:20:12.985256 19694 fs_manager.cc:474] Generated new instance metadata in path /var/lib/kudu/tserver/instance:
uuid: "89b8ccc42dfb4d39a83d801ceb1ff0a1"
format_stamp: "Formatted at 2017-09-19 12:20:12 on my-node"
I0919 05:20:12.993276 19694 fs_manager.cc:431] Time spent creating directory manager: real 0.006s	user 0.000s	sys 0.000s
I0919 05:20:12.993454 19694 env_posix.cc:1589] Not raising process file limit of 65536; it is already as high as it can go
I0919 05:20:12.993530 19694 file_cache.cc:469] Constructed file cache lbm with capacity 26214
I0919 05:20:12.995923 19694 fs_manager.cc:335] Time spent opening block manager: real 0.000s	user 0.000s	sys 0.000s
I0919 05:20:12.995976 19694 fs_manager.cc:348] Opened local filesystem: /var/lib/kudu/tserver
uuid: "89b8ccc42dfb4d39a83d801ceb1ff0a1"
format_stamp: "Formatted at 2017-09-19 12:20:12 on my-node"
I0919 05:20:12.996026 19694 fs_report.cc:347] Block manager report
--------------------
1 data directories: /var/lib/kudu/tserver/data
Total live blocks: 0
Total live bytes: 0
Total live bytes (after alignment): 0
Total number of LBM containers: 0 (0 full)
Did not check for missing blocks
Did not check for orphaned blocks
Total full LBM containers with extra space: 0 (0 repaired)
Total full LBM container extra space in bytes: 0 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired)
I0919 05:20:13.107158 19694 ts_tablet_manager.cc:216] Loaded tablet metadata (0 live tablets)
I0919 05:20:13.107344 19694 tablet_server_main.cc:84] Starting tablet server...
I0919 05:20:13.149742 19694 rpc_server.cc:192] RPC server started. Bound to: 0.0.0.0:7050
I0919 05:20:13.569461 19694 webserver.cc:149] Starting webserver on 0.0.0.0:8050
I0919 05:20:13.569495 19694 webserver.cc:160] Document root disabled
I0919 05:20:13.569829 19694 webserver.cc:282] Webserver started. Bound to: http://0.0.0.0:8050/
F0919 05:20:13.158834 19831 server_base.cc:316] Check failed: _s.ok() Bad status: Illegal state: Not started
*** Check failure stack trace: ***
I0919 05:20:13.589279 19694 tablet_server_main.cc:87] Tablet server successfully started.
I0919 05:20:13.570534 19842 process_memory.cc:181] Process hard memory limit is 0.464404 GB
I0919 05:20:13.589357 19842 process_memory.cc:183] Process soft memory limit is 0.371523 GB
I0919 05:20:13.589368 19842 process_memory.cc:186] Process memory pressure threshold is 0.278643 GB
Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/1b23130b-f2d7-4724-7eb86aa5-72cfcc16.dmp
Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/1b23130b-f2d7-4724-7eb86aa5-72cfcc16.dmp
*** Aborted at 1505823613 (unix time) try "date -d @1505823613" if you are using GNU date ***
PC: @     0x7ff5ea9b4067 gsignal
*** SIGABRT (@0x3e700004cee) received by PID 19694 (TID 0x7ff5adfe0700) from PID 19694; stack trace: ***
    @     0x7ff5ec6ce8d0 (unknown)
    @     0x7ff5ea9b4067 gsignal
    @     0x7ff5ea9b5448 abort
    @          0x2626d1b kudu::AbortFailureFunction()
    @          0x112f26d google::LogMessage::Fail()
W0919 05:20:13.640193 19841 heartbeater.cc:524] Failed to heartbeat to 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: Client connection negotiation failed: client connection to 127.0.0.1:7051: connect: Connection refused (error 111)
    @          0x113113d google::LogMessage::SendToLog()
    @          0x112eda9 google::LogMessage::Flush()
    @          0x1131bdf google::LogMessageFatal::~LogMessageFatal()
    @          0x11d88ba kudu::server::ServerBase::GetStatusPB()
    @          0x12034fc kudu::server::GenericServiceImpl::GetStatus()
    @          0x121f062 _ZZN4kudu6server16GenericServiceIfC4ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE10_clESG_SH_SJ_
    @          0x1221cf9 _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6server16GenericServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E10_E9_M_invokeERKSt9_Any_dataS4_S5_S9_
    @          0x24a902a std::function<>::operator()()
    @          0x24a888b kudu::rpc::GeneratedServiceIf::Handle()
    @          0x24aa9b4 kudu::rpc::ServicePool::RunThread()
    @          0x24aba2d boost::_mfi::mf0<>::operator()()
    @          0x24ab85c boost::_bi::list1<>::operator()<>()
    @          0x24ab751 boost::_bi::bind_t<>::operator()()
    @          0x24ab687 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x24cb570 boost::function0<>::operator()()
    @          0x265cb52 kudu::Thread::SuperviseThread()
    @     0x7ff5ec6c70a4 start_thread
    @     0x7ff5eaa6787d clone
    @                0x0 (unknown)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)