You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2018/10/31 16:18:00 UTC

[jira] [Commented] (KUDU-2453) kudu should stop creating tablet infinitely

    [ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670330#comment-16670330 ] 

Adar Dembo commented on KUDU-2453:
----------------------------------

Creating a table with many thousands of tablets can destroy a cluster in various interesting ways. Tablet servers can crash after trying to create too many threads. The master can crash too. KUDU-2611 describes some of these problems; in that bug, specifying a replication factor of 1 bypasses the max_create_tablets_per_ts guard rail altogether. But as you described in this JIRA, you can also set it to a much higher value, opening the door to abuse.

I don't really understand the premise of the JIRA though. Can you explain how you concluded that tablet creation is happening 'infinitely'? Won't all of the tablets eventually be created? It's true that the master will "replace" a tablet that isn't created in 30s with a new one; if this is leads to a table that's perpetually being creating, can you show how?


> kudu should stop creating tablet infinitely
> -------------------------------------------
>
>                 Key: KUDU-2453
>                 URL: https://issues.apache.org/jira/browse/KUDU-2453
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.4.0, 1.7.2
>            Reporter: HeLifu
>            Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 1.7.2.
> kudu-master's log as below:
> {code:java}
> I1031 16:21:21.644222 180146 catalog_manager.cc:2922] Sending DeleteTablet(TABLET_DATA_DELETED) for tablet d1fd56be8eef44e782d509a0eeae9c15 on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by ff4fd0a538944d69b8a6beea81e5bb01 at 2018-10-24 12:39:17 CST)
> W1031 16:21:21.644421 180146 catalog_manager.cc:2892] TS 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for tablet d1fd56be8eef44e782d509a0eeae9c15 with error code TABLET_NOT_RUNNING: Already present: State transition of tablet d1fd56be8eef44e782d509a0eeae9c15 already in progress: creating tablet
> I1031 16:21:21.644436 180146 catalog_manager.cc:2700] Scheduling retry of d1fd56be8eef44e782d509a0eeae9c15 Delete Tablet RPC for TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 553 ms (attempt = 6)
> {code}
> kudu-tserver's log as below:
>  
> {code:java}
> I1031 16:21:22.197888 137341 tablet_service.cc:799] Processing DeleteTablet for tablet d1fd56be8eef44e782d509a0eeae9c15 with delete_type TABLET_DATA_DELETED (Replaced by ff4fd0a538944d69b8a6beea81e5bb01 at 2018-10-24 12:39:17 CST) from {username='kudu'} at 10.120.219.118:50247
> I1031 16:21:22.230309 137131 maintenance_manager.cc:492] P 39f15fcf42ef45bba0c95a3223dc25ee: FlushDeltaMemStoresOp(70499bc0f9ac4d8196ae5a0be6ef0b8b) complete. Timing: real 0.416s	user 0.404s	sys 0.008s Metrics: {"fdatasync":3,"fdatasync_us":2583,"lbm_write_time_us":29,"lbm_writes_lt_1ms":4}
> I1031 16:21:22.321700 137341 tablet_service.cc:799] Processing DeleteTablet for tablet 74a30181dea9400a9bcfaeb56f83f379 with delete_type TABLET_DATA_DELETED (Replaced by 31e350fddea443048946f5a20d3171bd at 2018-10-31 16:21:13 CST) from {username='kudu'} at 10.120.219.118:50247
> I1031 16:21:22.350440 137341 tablet_service.cc:799] Processing DeleteTablet for tablet 7c864af01309432c9a2a4d1c88bbe52b with delete_type TABLET_DATA_DELETED (Replaced by ec4b733818d940e0af32c51bda3c7^C
> {code}
>  
> -----------------------------------------------------------------------
> We modified the flag '{color:#FF0000}max_create_tablets_per_ts{color}' (2000) of master.conf, and there is some load on the kudu cluster. Then someone else created a big table which had tens of thousands of tablets from impala-shell (it was a mistake).
> It was a long time for him to wait, so he did "ctrl+c". But we found that the tablets in 'INITIALIZED' status was growing rapidly, half an hour later it was 350,000 :(
> We deleted this table by kudu client tool, and found that the number of 'INITIALIZED' tablets was going down slowly. By simple estimating it will take 10+ days to be back to normal.  But luckily, the application system are not affected.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)