You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ZhangZhen (JIRA)" <ji...@apache.org> on 2017/10/31 15:39:01 UTC

[jira] [Created] (KUDU-2206) Kudu client create table timeout

ZhangZhen created KUDU-2206:
-------------------------------

             Summary: Kudu client create table timeout
                 Key: KUDU-2206
                 URL: https://issues.apache.org/jira/browse/KUDU-2206
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.3.0
            Reporter: ZhangZhen
         Attachments: kudu_master.log, tserver_01_0f53a0d3.log, tserver_07_23f962e4a1.log, tsever_02_0a8bbcbb.log

We encountered rpc timeout exception when we use sparksql, which use Java kudu client innerly, to create table on kudu cluster. The cluster has 10 tserver and 1 master on 10 machines, the target table has 10 range partitions and 5 hash partitions. 

From the web UI, I found it spent about 3 minutes before all the tablets vote a leader, and I can see a lot delete tablet records in the UI like:
Delete Tablet	Running	2.13 min	719f0f496bc34a469e4069b2861b4be8 Delete Tablet RPC for TS=044f1da9a27c46acb82b1386f829f4dc

Also I find many retry records in tserver logs, like:
W1031 23:04:40.088256  5816 consensus_peers.cc:357] T fcde65c4e4cf4df29b9ef9884ce292b2 P 0f53a0d3ef7e44ebb0365c800752d5bd -> Peer 23f962e4a1744381ad5fa0d2d8b10241 (c3-kudu-tst-st07.bj:18700): Couldn't send request to peer 23f962e4a1744381ad5fa0d2d8b10241 for tablet fcde65c4e4cf4df29b9ef9884ce292b2. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: NOT_STARTED. Retrying in the next heartbeat period. Already tried 94 times.

You can find the logs of master and tserver since master receive the create table request in the attachment.

The kudu version is 1.3.0, the nearest commit is 00813f96b9cb0c9ec57a17e5c85242f7679db0e0

The exception that client received is like:
Error: org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=IsCreateTableDone, tablet=null, attempt=25, DeadlineTracker(timeout=30000, elapsed=28499), Traces: [0ms] sending RPC to server , [0ms] received from server  response OK, [20ms] sending RPC to server , [20ms] received from server  response OK, [40ms] sending RPC to server , [40ms] received from server  response OK, [59ms] sending RPC to server , [60ms] received from server  response OK, [80ms] sending RPC to server , [80ms] received from server  response OK, [100ms] sending RPC to server , [100ms] received from server  response OK, [140ms] sending RPC to server , [141ms] received from server  response OK, [200ms] sending RPC to server , [200ms] received from server  response OK, [319ms] sending RPC to server , [320ms] received from server  response OK, [780ms] sending RPC to server , [780ms] received from server  response OK, [2740ms] sending RPC to server , [2741ms] received from server  response OK, [3580ms] sending RPC to server , [3580ms] received from server  response OK, [4840ms] sending RPC to server , [4840ms] received from server  response OK, [7080ms] sending RPC to server , [7081ms] received from server  response OK, [8320ms] sending RPC to server , [8321ms] received from server  response OK, [11620ms] sending RPC to server , [11621ms] received from server  response OK, [13540ms] sending RPC to server , [13540ms] received from server  response OK, [16819ms] sending RPC to server , [16820ms] received from server  response OK, [19020ms] sending RPC to server , [19020ms] received from server  response OK, [21340ms] sending RPC to server , [21341ms] received from server  response OK, [24660ms] sending RPC to server , [24661ms] received from server  response OK, [26800ms] sending RPC to server , [26800ms] received from server  response OK, [27660ms] sending RPC to server , [27660ms] received from server  response OK, [28480ms] sending RPC to server , [28481ms] received from server



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)