You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ZhangZhen (JIRA)" <ji...@apache.org> on 2017/10/31 15:48:00 UTC

[jira] [Comment Edited] (KUDU-2206) Kudu client create table timeout

    [ https://issues.apache.org/jira/browse/KUDU-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227009#comment-16227009 ] 

ZhangZhen edited comment on KUDU-2206 at 10/31/17 3:47 PM:
-----------------------------------------------------------

For the tablet fcde65c4e4cf4df29b9ef9884ce292b2, the 07 tserver spent more than 5 minutes before it started bootstraping it.

I1031 23:03:52.450762 32805 tablet_service.cc:650] Processing CreateTablet for tablet fcde65c4e4cf4df29b9ef9884ce292b2 (table=default.timeout_tst [id=cdcb3e60c1314776843d856365535d49]), partition=HASH (id) PARTITION 2, RANGE (date) PARTITION 20171031 <= VALUES < 20171101
I1031 23:03:53.198792 32805 ts_tablet_manager.cc:862] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Registered tablet (data state: TABLET_DATA_READY)
I1031 23:09:04.713188 27558 ts_tablet_manager.cc:723] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Bootstrapping tablet
I1031 23:09:04.713435 27558 tablet_bootstrap.cc:382] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Bootstrap starting.
I1031 23:09:04.713558 27558 tablet_bootstrap.cc:541] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Time spent opening tablet: real 0.000s	user 0.000s	sys 0.000s
I1031 23:09:04.713654 27558 tablet_bootstrap.cc:484] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: No blocks or log segments found. Creating new log.
I1031 23:09:04.715847 27558 tablet_bootstrap.cc:382] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: No bootstrap required, opened a new log
I1031 23:09:04.715870 27558 ts_tablet_manager.cc:728] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Time spent bootstrapping tablet: real 0.003s	user 0.001s	sys 0.001s


was (Author: zhquake@gmail.com):
For the tablet fcde65c4e4cf4df29b9ef9884ce292b2, the 07 tserver spent almost 5 minutes before it started bootstraping it.

I1031 23:03:52.450762 32805 tablet_service.cc:650] Processing CreateTablet for tablet fcde65c4e4cf4df29b9ef9884ce292b2 (table=default.timeout_tst [id=cdcb3e60c1314776843d856365535d49]), partition=HASH (id) PARTITION 2, RANGE (date) PARTITION 20171031 <= VALUES < 20171101
I1031 23:03:53.198792 32805 ts_tablet_manager.cc:862] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Registered tablet (data state: TABLET_DATA_READY)
I1031 23:09:04.713188 27558 ts_tablet_manager.cc:723] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Bootstrapping tablet
I1031 23:09:04.713435 27558 tablet_bootstrap.cc:382] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Bootstrap starting.
I1031 23:09:04.713558 27558 tablet_bootstrap.cc:541] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Time spent opening tablet: real 0.000s	user 0.000s	sys 0.000s
I1031 23:09:04.713654 27558 tablet_bootstrap.cc:484] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: No blocks or log segments found. Creating new log.
I1031 23:09:04.715847 27558 tablet_bootstrap.cc:382] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: No bootstrap required, opened a new log
I1031 23:09:04.715870 27558 ts_tablet_manager.cc:728] T fcde65c4e4cf4df29b9ef9884ce292b2 P 23f962e4a1744381ad5fa0d2d8b10241: Time spent bootstrapping tablet: real 0.003s	user 0.001s	sys 0.001s

> Kudu client create table timeout
> --------------------------------
>
>                 Key: KUDU-2206
>                 URL: https://issues.apache.org/jira/browse/KUDU-2206
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: ZhangZhen
>         Attachments: kudu_master.log, tserver_01_0f53a0d3.log, tserver_07_23f962e4a1.log, tsever_02_0a8bbcbb.log
>
>
> We encountered rpc timeout exception when we use sparksql, which use Java kudu client innerly, to create table on kudu cluster. The cluster has 10 tserver and 1 master on 10 machines, the target table has 10 range partitions and 5 hash partitions. 
> From the web UI, I found it spent about 3 minutes before all the tablets vote a leader, and I can see a lot delete tablet records in the UI like:
> Delete Tablet	Running	2.13 min	719f0f496bc34a469e4069b2861b4be8 Delete Tablet RPC for TS=044f1da9a27c46acb82b1386f829f4dc
> Also I find many retry records in tserver logs, like:
> W1031 23:04:40.088256  5816 consensus_peers.cc:357] T fcde65c4e4cf4df29b9ef9884ce292b2 P 0f53a0d3ef7e44ebb0365c800752d5bd -> Peer 23f962e4a1744381ad5fa0d2d8b10241 (c3-kudu-tst-st07.bj:18700): Couldn't send request to peer 23f962e4a1744381ad5fa0d2d8b10241 for tablet fcde65c4e4cf4df29b9ef9884ce292b2. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: NOT_STARTED. Retrying in the next heartbeat period. Already tried 94 times.
> You can find the logs of master and tserver since master receive the create table request in the attachment.
> The kudu version is 1.3.0, the nearest commit is 00813f96b9cb0c9ec57a17e5c85242f7679db0e0
> The exception that client received is like:
> Error: org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=IsCreateTableDone, tablet=null, attempt=25, DeadlineTracker(timeout=30000, elapsed=28499), Traces: [0ms] sending RPC to server , [0ms] received from server  response OK, [20ms] sending RPC to server , [20ms] received from server  response OK, [40ms] sending RPC to server , [40ms] received from server  response OK, [59ms] sending RPC to server , [60ms] received from server  response OK, [80ms] sending RPC to server , [80ms] received from server  response OK, [100ms] sending RPC to server , [100ms] received from server  response OK, [140ms] sending RPC to server , [141ms] received from server  response OK, [200ms] sending RPC to server , [200ms] received from server  response OK, [319ms] sending RPC to server , [320ms] received from server  response OK, [780ms] sending RPC to server , [780ms] received from server  response OK, [2740ms] sending RPC to server , [2741ms] received from server  response OK, [3580ms] sending RPC to server , [3580ms] received from server  response OK, [4840ms] sending RPC to server , [4840ms] received from server  response OK, [7080ms] sending RPC to server , [7081ms] received from server  response OK, [8320ms] sending RPC to server , [8321ms] received from server  response OK, [11620ms] sending RPC to server , [11621ms] received from server  response OK, [13540ms] sending RPC to server , [13540ms] received from server  response OK, [16819ms] sending RPC to server , [16820ms] received from server  response OK, [19020ms] sending RPC to server , [19020ms] received from server  response OK, [21340ms] sending RPC to server , [21341ms] received from server  response OK, [24660ms] sending RPC to server , [24661ms] received from server  response OK, [26800ms] sending RPC to server , [26800ms] received from server  response OK, [27660ms] sending RPC to server , [27660ms] received from server  response OK, [28480ms] sending RPC to server , [28481ms] received from server



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)