You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kudu.apache.org by Gary Gao <ga...@gmail.com> on 2018/08/12 05:17:03 UTC

How to decrease kudu server restart time

I have a kudu cluster of 40 nodes, when I realized that
maintenance_manager_num_threads=1 is too small, I updated config file and
restarted a kudu tablet server, but it took too long to start, longer than
--follower_unavailable_considered_failed_sec=600, causing tablet
redistribution.
Even if the kudu server started, it also spent too much copying tablet, as
the following tablet block copying log:


Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not
RUNNING
  41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
    State:       INITIALIZED
    Data state:  TABLET_DATA_COPYING
    Last status: Tablet Copy: Downloading block 0000000084111077
(299837/1177225)
  52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
  b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
[LEADER]


My Question are:

1. It seems kudu server spent a long time to open log block container, how
to speed up restarting kudu server ?

2. I think the number of blocks have an influence on kudu server restarting
time and query time on specific tablet, more number of blocks, more
restarting time and query time. Is this right ?

3. Why there are more than 1 million blocks in a tablet, as shown in above
Tablet Copy log, while there are less than 500 thousands of records in the
tablet ?

4. How to reduce the number of block in tablet ?

Re: How to decrease kudu server restart time

Posted by Attila Bukor <ab...@apache.org>.

It's available in 1.7.0 and above only.
On Mon, Aug 13, 2018 at 07:16:00PM +0800, Gary Gao wrote:
> I'm using Kudu 1.6.0, does this version have the feature you mentioned :
> 
> The recent versions are using 3-4-3 replica
> replacement, meaning the tablet copy should be automatically canceled
> when the third replica comes online and the copy hasn't finished yet.
> 
> On Mon, Aug 13, 2018 at 5:16 PM Attila Bukor <ab...@apache.org> wrote:
> 
> > Hi Gary,
> >
> > Please find my answers inline.
> >
> > On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote:
> > > I have a kudu cluster of 40 nodes, when I realized that
> > > maintenance_manager_num_threads=1 is too small, I updated config file and
> > > restarted a kudu tablet server, but it took too long to start, longer
> > than
> > > --follower_unavailable_considered_failed_sec=600, causing tablet
> > > redistribution.
> > > Even if the kudu server started, it also spent too much copying tablet,
> > as
> > > the following tablet block copying log:
> > >
> > >
> > > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> > > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s)
> > not
> > > RUNNING
> > >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> > >     State:       INITIALIZED
> > >     Data state:  TABLET_DATA_COPYING
> > >     Last status: Tablet Copy: Downloading block 0000000084111077
> > > (299837/1177225)
> > >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> > >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> > > [LEADER]
> > >
> >
> > Which version are you using? The recent versions are using 3-4-3 replica
> > replacement, meaning the tablet copy should be automatically canceled
> > when the third replica comes online and the copy hasn't finished yet.
> >
> > >
> > > My Question are:
> > >
> > > 1. It seems kudu server spent a long time to open log block container,
> > how
> > > to speed up restarting kudu server ?
> >
> > The startup time of the tablet servers mostly depends on the number of
> > tablets hosted on the server. I'm not sure if there's any way to tune
> > it, aside from reducing the number of tablets. How many tablets do you
> > have per tablet server?
> >
> > >
> > > 2. I think the number of blocks have an influence on kudu server
> > restarting
> > > time and query time on specific tablet, more number of blocks, more
> > > restarting time and query time. Is this right ?
> >
> > I'm not sure how much the number of blocks influences the restart time,
> > maybe someone else can shed some light on this one. I'd focus on the
> > number of tablets though.
> >
> > The query latencies depend on how many blocks the server needs to read
> > from, but it's a matter of how well the data is compacted (either by
> > sequential writes instead of random writes, or whether the maintenance
> > managers compacted them), rather than the number of total blocks.
> >
> > >
> > > 3. Why there are more than 1 million blocks in a tablet, as shown in
> > above
> > > Tablet Copy log, while there are less than 500 thousands of records in
> > the
> > > tablet ?
> > >
> >
> > Each rowset will have multiple blocks (one per column, UNDO and
> > REDO deltas, and bloom filters). The number of rowsets depends on the
> > number of rows.
> >
> > > 4. How to reduce the number of block in tablet ?
> >
> > The maintenance managers perform compactions that reduce the number of
> > blocks per tablets. Other than this, less columns or less rows also
> > results in less blocks of course.
> >
> > - Attila
> >

Re: How to decrease kudu server restart time

Posted by Gary Gao <ga...@gmail.com>.

I'm using Kudu 1.6.0, does this version have the feature you mentioned :

The recent versions are using 3-4-3 replica
replacement, meaning the tablet copy should be automatically canceled
when the third replica comes online and the copy hasn't finished yet.

On Mon, Aug 13, 2018 at 5:16 PM Attila Bukor <ab...@apache.org> wrote:

> Hi Gary,
>
> Please find my answers inline.
>
> On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote:
> > I have a kudu cluster of 40 nodes, when I realized that
> > maintenance_manager_num_threads=1 is too small, I updated config file and
> > restarted a kudu tablet server, but it took too long to start, longer
> than
> > --follower_unavailable_considered_failed_sec=600, causing tablet
> > redistribution.
> > Even if the kudu server started, it also spent too much copying tablet,
> as
> > the following tablet block copying log:
> >
> >
> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s)
> not
> > RUNNING
> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> >     State:       INITIALIZED
> >     Data state:  TABLET_DATA_COPYING
> >     Last status: Tablet Copy: Downloading block 0000000084111077
> > (299837/1177225)
> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> > [LEADER]
> >
>
> Which version are you using? The recent versions are using 3-4-3 replica
> replacement, meaning the tablet copy should be automatically canceled
> when the third replica comes online and the copy hasn't finished yet.
>
> >
> > My Question are:
> >
> > 1. It seems kudu server spent a long time to open log block container,
> how
> > to speed up restarting kudu server ?
>
> The startup time of the tablet servers mostly depends on the number of
> tablets hosted on the server. I'm not sure if there's any way to tune
> it, aside from reducing the number of tablets. How many tablets do you
> have per tablet server?
>
> >
> > 2. I think the number of blocks have an influence on kudu server
> restarting
> > time and query time on specific tablet, more number of blocks, more
> > restarting time and query time. Is this right ?
>
> I'm not sure how much the number of blocks influences the restart time,
> maybe someone else can shed some light on this one. I'd focus on the
> number of tablets though.
>
> The query latencies depend on how many blocks the server needs to read
> from, but it's a matter of how well the data is compacted (either by
> sequential writes instead of random writes, or whether the maintenance
> managers compacted them), rather than the number of total blocks.
>
> >
> > 3. Why there are more than 1 million blocks in a tablet, as shown in
> above
> > Tablet Copy log, while there are less than 500 thousands of records in
> the
> > tablet ?
> >
>
> Each rowset will have multiple blocks (one per column, UNDO and
> REDO deltas, and bloom filters). The number of rowsets depends on the
> number of rows.
>
> > 4. How to reduce the number of block in tablet ?
>
> The maintenance managers perform compactions that reduce the number of
> blocks per tablets. Other than this, less columns or less rows also
> results in less blocks of course.
>
> - Attila
>

Re: How to decrease kudu server restart time

Posted by Attila Bukor <ab...@apache.org>.

Hi Gary,

Please find my answers inline.

On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote:
> I have a kudu cluster of 40 nodes, when I realized that
> maintenance_manager_num_threads=1 is too small, I updated config file and
> restarted a kudu tablet server, but it took too long to start, longer than
> --follower_unavailable_considered_failed_sec=600, causing tablet
> redistribution.
> Even if the kudu server started, it also spent too much copying tablet, as
> the following tablet block copying log:
> 
> 
> Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not
> RUNNING
>   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
>     State:       INITIALIZED
>     Data state:  TABLET_DATA_COPYING
>     Last status: Tablet Copy: Downloading block 0000000084111077
> (299837/1177225)
>   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
>   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> [LEADER]
> 

Which version are you using? The recent versions are using 3-4-3 replica
replacement, meaning the tablet copy should be automatically canceled
when the third replica comes online and the copy hasn't finished yet.

> 
> My Question are:
> 
> 1. It seems kudu server spent a long time to open log block container, how
> to speed up restarting kudu server ?

The startup time of the tablet servers mostly depends on the number of
tablets hosted on the server. I'm not sure if there's any way to tune
it, aside from reducing the number of tablets. How many tablets do you
have per tablet server?

> 
> 2. I think the number of blocks have an influence on kudu server restarting
> time and query time on specific tablet, more number of blocks, more
> restarting time and query time. Is this right ?

I'm not sure how much the number of blocks influences the restart time,
maybe someone else can shed some light on this one. I'd focus on the
number of tablets though.

The query latencies depend on how many blocks the server needs to read
from, but it's a matter of how well the data is compacted (either by
sequential writes instead of random writes, or whether the maintenance
managers compacted them), rather than the number of total blocks.

> 
> 3. Why there are more than 1 million blocks in a tablet, as shown in above
> Tablet Copy log, while there are less than 500 thousands of records in the
> tablet ?
> 

Each rowset will have multiple blocks (one per column, UNDO and
REDO deltas, and bloom filters). The number of rowsets depends on the
number of rows.

> 4. How to reduce the number of block in tablet ?

The maintenance managers perform compactions that reduce the number of
blocks per tablets. Other than this, less columns or less rows also
results in less blocks of course.

- Attila

Re: How to decrease kudu server restart time

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.

The information you provided is the FsReport from the log file of one
node, and it represents all of the data on that node. Is this the only
table in your cluster? Or do you have others? I didn't see the output
of `kudu local_replica data_size`; did you forget to include that?

It seems that your average block size is quite small (about 60K),
which is part of the reason you're seeing so many blocks. You
mentioned having a high number of updates; Kudu isn't optimized for
that. One of the things that may be happening here is that the table
is fully compacted and yet updates are still streaming in. AFAIK,
ancient history is only cleaned up during compactions, but if there
are none, ancient history will persist and your 3.3 million records
will actually be represented by many more blocks (and bytes) on disk.
I also wonder whether, by virtue of being fully compacted and with
only 50k records being ingested, Kudu is aggressively flushing your
DeltaMemStores (the in-memory stores that accumulate updates) and thus
producing tiny blocks. In a workload with more writes Kudu will be
busy flushing the tablets' MemRowSets at the expense of flushing
DeltaMemStores, so by the time they are flushed, they'll be much
beefier. But an idle Kudu should be compacting those blocks via minor
and major delta compactions, so eventually those tiny blocks will be
coalesced into larger ones.

Your partitioning schema, by virtue of being a hash of the entire
primary key, appears to be optimized for reads at the expense of
writes. That makes sense given how little you're ingesting.

I wouldn't recommend changing any of those parameters; the default
values are usually fine.

How many data directories do you have? We recommend setting
--maintenance_manager_num_threads to be equal to the number of data
directories divided by 3.

On Wed, Aug 15, 2018 at 3:21 AM Gary Gao <ga...@gmail.com> wrote:
>
> The output of command [kudu local_replica data_size] are shown below, but it seems that the **Total live blocks** are the total block number of the table, not specific tablet:
>
> Total live blocks: 22515001
> Total live bytes: 1362248371390
> Total live bytes (after alignment): 1446784176128
> Total number of LBM containers: 22403 (17366 full)
> .....
> .....
>
>
> table schema:
>
> create table venus.ods_xk_pay_fee_order(
> time_day bigint,
> CREATETIME BIGINT,
> BUYERID BIGINT,
> SELLERID BIGINT,
> ORDERID String,
> BIZID BIGINT,
> ID BIGINT,
> SELLERFAMILYID BIGINT,
> PRODUCTID BIGINT,
> PRODUCTTYPE BIGINT,
> PRICE BIGINT,
> REALPRICE BIGINT,
> DISCOUNT BIGINT,
> SHARERATE BIGINT,
> DEVICETYPE BIGINT,
> DEVICEID String,
> APPID BIGINT,
> PKNAME String,
> APPVERSION String,
> CREATEIP BIGINT,
> SERIALID String,
> SCID String,
> COMPLETESTATUS BIGINT,
> COMPLETETIME BIGINT,
> TRYCOUNT BIGINT,
> APPCHANNEL String,
> SDKID BIGINT,
> LIVESTATUS BIGINT,
> PAYSTATUS BIGINT,
> THRIDORDERID String,
> LIVESOURCE BIGINT,
> LIVEPRODUCTTYPE BIGINT,
> PAYMODE BIGINT,
> SUBPRODUCTTYPE BIGINT,
> SALETYPE BIGINT,
> primary key(time_day, createtime, buyerid, sellerid, orderid, bizid, id))
> partition by hash (time_day, createtime, buyerid, sellerid, orderid, bizid, id) partitions 3,
> range(time_day)(PARTITION 1483200000 <= values < 1514736000, ...) stored as kudu
>
>
>
> There are only 3.3 millions records[in 3 tablets] in this table, and less 50 thousands records are ingested in this table every day, with many updates.
>
>
> I deep dived into kudu flags configuration and found the following flags related to **BLOCK_SIZE**, what is the recommended value of these flags:
>
> --cfile_default_block_size=262144
>
> --deltafile_default_block_size=32768
>
> -default_composite_key_index_block_size_bytes=4096
>
> --tablet_bloom_block_size=4096
>
>
>
> On Tue, Aug 14, 2018 at 5:41 AM Adar Lieber-Dembo <ad...@cloudera.com> wrote:
>>
>> > Even if the kudu server started, it also spent too much copying tablet, as the following tablet block copying log:
>> >
>> >
>> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not RUNNING
>> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
>> >     State:       INITIALIZED
>> >     Data state:  TABLET_DATA_COPYING
>> >     Last status: Tablet Copy: Downloading block 0000000084111077 (299837/1177225)
>> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
>> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING [LEADER]
>>
>> I see that this tablet has over a million blocks, but how are you
>> measuring that it's spending too much time copying? How much time did
>> it take to fully copy this tablet?
>>
>> > 1. It seems kudu server spent a long time to open log block container, how to speed up restarting kudu server ?
>>
>> Your Kudu server log should contain some log messages that'll help us
>> understand what's going on. Look for a message like "Time spent
>> opening block manager" and paste that.  Also can you find and paste
>> the "FS layout report"?
>>
>> In general, the more blocks (and thus block containers) you have, the
>> longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
>> might improve this.
>>
>> Once a tserver is deemed dead and its data is rereplicated elsewhere,
>> you can just reformat the node (i.e. delete the contents of the WAL,
>> metadata, and data directories). Its contents are no longer necessary,
>> and this will reset the number of log block containers to 0, which
>> will speed up subsequent restarts.
>>
>> > 2. I think the number of blocks have an influence on kudu server restarting time and query time on specific tablet, more number of blocks, more restarting time and query time. Is this right ?
>>
>> Yes to restarting time, but not necessarily to query time. It really
>> depends on the kinds of queries you're issuing, how many predicates
>> they have, etc.
>>
>> > 3. Why there are more than 1 million blocks in a tablet, as shown in above Tablet Copy log, while there are less than 500 thousands of records in the tablet ?
>>
>> That's an excellent question. What kind of write workload do you have?
>> What's your table schema and partitioning? Do you have any
>> non-standard flags defined that may affect how Kudu flushes or
>> compacts its data?
>>
>> I'd also suggest running the CLI tool 'kudu local_replica data_size'
>> on that large replica you described above. It will help identify
>> whether this is a case of very large tablets, or just high numbers of
>> blocks.
>>
>> > 4. How to reduce the number of block in tablet ?
>>
>> Once you answer the questions I posed just above, I might be able to
>> offer some recommendations for how to reduce the overall number of
>> blocks.

Re: How to decrease kudu server restart time

Posted by Gary Gao <ga...@gmail.com>.

The output of command [kudu local_replica data_size] are shown below, but
it seems that the **Total live blocks** are the total block number of the
table, not specific tablet:

Total live blocks: 22515001
Total live bytes: 1362248371390
Total live bytes (after alignment): 1446784176128
Total number of LBM containers: 22403 (17366 full)
.....
.....


table schema:

create table venus.ods_xk_pay_fee_order(
time_day bigint,
CREATETIME BIGINT,
BUYERID BIGINT,
SELLERID BIGINT,
ORDERID String,
BIZID BIGINT,
ID BIGINT,
SELLERFAMILYID BIGINT,
PRODUCTID BIGINT,
PRODUCTTYPE BIGINT,
PRICE BIGINT,
REALPRICE BIGINT,
DISCOUNT BIGINT,
SHARERATE BIGINT,
DEVICETYPE BIGINT,
DEVICEID String,
APPID BIGINT,
PKNAME String,
APPVERSION String,
CREATEIP BIGINT,
SERIALID String,
SCID String,
COMPLETESTATUS BIGINT,
COMPLETETIME BIGINT,
TRYCOUNT BIGINT,
APPCHANNEL String,
SDKID BIGINT,
LIVESTATUS BIGINT,
PAYSTATUS BIGINT,
THRIDORDERID String,
LIVESOURCE BIGINT,
LIVEPRODUCTTYPE BIGINT,
PAYMODE BIGINT,
SUBPRODUCTTYPE BIGINT,
SALETYPE BIGINT,
primary key(time_day, createtime, buyerid, sellerid, orderid, bizid, id))
partition by hash (time_day, createtime, buyerid, sellerid, orderid, bizid,
id) partitions 3,
range(time_day)(PARTITION 1483200000 <= values < 1514736000, ...) stored as
kudu



There are only 3.3 millions records[in 3 tablets] in this table, and less
50 thousands records are ingested in this table every day, with many
updates.


I deep dived into kudu flags configuration and found the following flags
related to **BLOCK_SIZE**, what is the recommended value of these flags:

--cfile_default_block_size=262144

--deltafile_default_block_size=32768

-default_composite_key_index_block_size_bytes=4096

--tablet_bloom_block_size=4096



On Tue, Aug 14, 2018 at 5:41 AM Adar Lieber-Dembo <ad...@cloudera.com> wrote:

> > Even if the kudu server started, it also spent too much copying tablet,
> as the following tablet block copying log:
> >
> >
> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not
> RUNNING
> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> >     State:       INITIALIZED
> >     Data state:  TABLET_DATA_COPYING
> >     Last status: Tablet Copy: Downloading block 0000000084111077
> (299837/1177225)
> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> [LEADER]
>
> I see that this tablet has over a million blocks, but how are you
> measuring that it's spending too much time copying? How much time did
> it take to fully copy this tablet?
>
> > 1. It seems kudu server spent a long time to open log block container,
> how to speed up restarting kudu server ?
>
> Your Kudu server log should contain some log messages that'll help us
> understand what's going on. Look for a message like "Time spent
> opening block manager" and paste that.  Also can you find and paste
> the "FS layout report"?
>
> In general, the more blocks (and thus block containers) you have, the
> longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
> might improve this.
>
> Once a tserver is deemed dead and its data is rereplicated elsewhere,
> you can just reformat the node (i.e. delete the contents of the WAL,
> metadata, and data directories). Its contents are no longer necessary,
> and this will reset the number of log block containers to 0, which
> will speed up subsequent restarts.
>
> > 2. I think the number of blocks have an influence on kudu server
> restarting time and query time on specific tablet, more number of blocks,
> more restarting time and query time. Is this right ?
>
> Yes to restarting time, but not necessarily to query time. It really
> depends on the kinds of queries you're issuing, how many predicates
> they have, etc.
>
> > 3. Why there are more than 1 million blocks in a tablet, as shown in
> above Tablet Copy log, while there are less than 500 thousands of records
> in the tablet ?
>
> That's an excellent question. What kind of write workload do you have?
> What's your table schema and partitioning? Do you have any
> non-standard flags defined that may affect how Kudu flushes or
> compacts its data?
>
> I'd also suggest running the CLI tool 'kudu local_replica data_size'
> on that large replica you described above. It will help identify
> whether this is a case of very large tablets, or just high numbers of
> blocks.
>
> > 4. How to reduce the number of block in tablet ?
>
> Once you answer the questions I posed just above, I might be able to
> offer some recommendations for how to reduce the overall number of
> blocks.
>

Re: How to decrease kudu server restart time

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.

> Even if the kudu server started, it also spent too much copying tablet, as the following tablet block copying log:
>
>
> Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not RUNNING
>   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
>     State:       INITIALIZED
>     Data state:  TABLET_DATA_COPYING
>     Last status: Tablet Copy: Downloading block 0000000084111077 (299837/1177225)
>   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
>   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING [LEADER]

I see that this tablet has over a million blocks, but how are you
measuring that it's spending too much time copying? How much time did
it take to fully copy this tablet?

> 1. It seems kudu server spent a long time to open log block container, how to speed up restarting kudu server ?

Your Kudu server log should contain some log messages that'll help us
understand what's going on. Look for a message like "Time spent
opening block manager" and paste that.  Also can you find and paste
the "FS layout report"?

In general, the more blocks (and thus block containers) you have, the
longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
might improve this.

Once a tserver is deemed dead and its data is rereplicated elsewhere,
you can just reformat the node (i.e. delete the contents of the WAL,
metadata, and data directories). Its contents are no longer necessary,
and this will reset the number of log block containers to 0, which
will speed up subsequent restarts.

> 2. I think the number of blocks have an influence on kudu server restarting time and query time on specific tablet, more number of blocks, more restarting time and query time. Is this right ?

Yes to restarting time, but not necessarily to query time. It really
depends on the kinds of queries you're issuing, how many predicates
they have, etc.

> 3. Why there are more than 1 million blocks in a tablet, as shown in above Tablet Copy log, while there are less than 500 thousands of records in the tablet ?

That's an excellent question. What kind of write workload do you have?
What's your table schema and partitioning? Do you have any
non-standard flags defined that may affect how Kudu flushes or
compacts its data?

I'd also suggest running the CLI tool 'kudu local_replica data_size'
on that large replica you described above. It will help identify
whether this is a case of very large tablets, or just high numbers of
blocks.

> 4. How to reduce the number of block in tablet ?

Once you answer the questions I posed just above, I might be able to
offer some recommendations for how to reduce the overall number of
blocks.