You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by wei ximing <wx...@outlook.com> on 2019/11/12 06:46:45 UTC

Tuning kudu write performance

Hi!

I have some questions about kudu performance tuning.


Kudu version: kudu 1.7.0-cdh5.16.2

System memary pre node:256G

4 SSDs per machine:512G

Three Master nodes and three Tserver nodes.

// Master config
--fs_wal_dir=/mnt/disk1/kudu/var/wal
--fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
--fs_metadata_dir=/mnt/disk1/kudu/var/metadata
--log_dir=/mnt/disk1/kudu/var/logs
--master_addresses=xxxx
--maintenance_manager_num_threads=2
--block_cache_capacity_mb=6144
--memory_limit_hard_bytes=34359738368
--max_log_size=40

// Tserver config
--fs_wal_dir=/mnt/disk1/kudu/var/wal
--fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
--fs_metadata_dir=/mnt/disk1/kudu/var/metadata
--log_dir=/mnt/disk1/kudu/var/logs
--tserver_master_addrs=xxxx
--block_cache_capacity_mb=6144
--memory_limit_hard_bytes=34359738368
--max_log_size=40

// Table schema
// _key is UUID for each msg
// event_time is data time
// Schema has only 15 columns
// Single message does not exceed 100Bytes

HASH (_key) PARTITIONS 3,
RANGE (event_time) (
    PARTITION 2019-10-31T16:00:00.000000Z <= VALUES < 2019-11-30T16:00:00.000000Z
)


I write a project to write data to kudu.

Whether manual or automatic flush mode write speed is only 6MB/s.

I think SSD should be more than this speed, and the network and memory have not reached the bottleneck.

Is this the normal level of kudu writing? How to tuning?


Thanks.

Re: Tuning kudu write performance

Posted by Alexey Serbin <as...@cloudera.com>.
Since I'm not sure I understood which flush mode is used at the client
side, I can suggest to make sure that your application is using
AUTO_FLUSH_BACKROUND flush mode (Java API link):

https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html#AUTO_FLUSH_BACKGROUND

Another point is the data the size of the write buffer at the client side.
If you are using Kudu Java client, with only 100Bytes in a row, a
KuduSession using AUTO_FLUSH_BACKGROUND mode buffers only 1000 rows, which
is about 100K per write batch.  Consider increasing the size of the buffer
using KuduSession.setMutationBufferSpace() method at least 10x times:

https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduSession.html#setMutationBufferSpace-int-


Thanks,

Alexey

On Wed, Nov 13, 2019 at 4:12 PM Adar Lieber-Dembo <ad...@cloudera.com> wrote:

> Oh whoops, I didn't scroll down and missed that. Thanks!
>
> Mauricio's suggestion is a good one. To that I would add: consider
> increasing the number of hash buckets.
>
> Additionally, what's the rest of the primary key look like? _key and
> event_time are in there, but in what order? UUIDs in particular are usually
> a poor choice for primary keys because of their random distribution, all
> but guaranteeing lots of compaction during ingest, which slows down
> throughput considerably. How bad it is depends on the arrangement of
> columns in the primary key, and how that order reflects (or does not
> reflect) the key order of incoming data.
>
> On Wed, Nov 13, 2019 at 4:00 PM Mauricio Aristizabal <ma...@impact.com>
> wrote:
>
>> You should start by making sure each of your 3 hash partition tablets'
>> leaders is in each of your 3 nodes.  Very well could be all 3 were in the
>> same tablet server and you were ingesting into a single node.  If needed
>> use leader_step_down
>> <https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-leader_step_down> to
>> move leaders around.
>>
>> FYI Adar, table schema was at bottom inside that iframe
>>
>> On Wed, Nov 13, 2019 at 3:24 PM Adar Lieber-Dembo <ad...@cloudera.com>
>> wrote:
>>
>>> Some thoughts on how you might increase your write speed:
>>> - Don't use the same disk for both WAL and data directories. If you
>>> have enough disks, dedicate one for the WAL and the rest for data
>>> directories.
>>> - Since each disk is an SSD, experiment with a higher ratio of MM
>>> threads to data directories. We typically recommend 1:3, but that's
>>> for spinning disks. I see you've configured 2 MM threads for the
>>> masters but are still using just 1 for the tservers? Consider using
>>> 2-4.
>>> - How is your schema structured? Are you using hash partitioning?
>>> Range partitioning? Both? What's your primary key look like and does
>>> incoming data arrive in sorted order (or mostly sorted order) w.r.t.
>>> that key? Random order?
>>> https://kudu.apache.org/docs/schema_design.html is an excellent
>>> resource for understanding how schema can impact writes and reads.
>>>
>>> On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wx...@outlook.com>
>>> wrote:
>>> >
>>> > Hi!
>>> >
>>> > I have some questions about kudu performance tuning.
>>> >
>>> > Kudu version: kudu 1.7.0-cdh5.16.2
>>> >
>>> > System memary pre node:256G
>>> >
>>> > 4 SSDs per machine:512G
>>> >
>>> > Three Master nodes and three Tserver nodes.
>>> >
>>> > // Master config
>>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>>> >
>>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>>> > --log_dir=/mnt/disk1/kudu/var/logs
>>> > --master_addresses=xxxx
>>> > --maintenance_manager_num_threads=2
>>> > --block_cache_capacity_mb=6144
>>> > --memory_limit_hard_bytes=34359738368
>>> > --max_log_size=40
>>> >
>>> > // Tserver config
>>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>>> >
>>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>>> > --log_dir=/mnt/disk1/kudu/var/logs
>>> > --tserver_master_addrs=xxxx
>>> > --block_cache_capacity_mb=6144
>>> > --memory_limit_hard_bytes=34359738368
>>> > --max_log_size=40
>>> >
>>> > // Table schema
>>> > // _key is UUID for each msg
>>> > // event_time is data time
>>> > // Schema has only 15 columns
>>> > // Single message does not exceed 100Bytes
>>> >
>>> > HASH (_key) PARTITIONS 3,
>>> > RANGE (event_time) (
>>> >     PARTITION 2019-10-31T16:00:00.000000Z <= VALUES <
>>> 2019-11-30T16:00:00.000000Z
>>> > )
>>> >
>>> > I write a project to write data to kudu.
>>> >
>>> > Whether manual or automatic flush mode write speed is only 6MB/s.
>>> >
>>> > I think SSD should be more than this speed, and the network and memory
>>> have not reached the bottleneck.
>>> >
>>> > Is this the normal level of kudu writing? How to tuning?
>>> >
>>> >
>>> > Thanks.
>>>
>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauricio@impact.com | 323 309 4260
>> https://impact.com
>> <https://www.linkedin.com/company/impact-martech/>
>> <https://www.facebook.com/ImpactParTech/>
>> <https://twitter.com/impactpartech>
>> <https://www.youtube.com/c/impactmartech>
>>
>

Re: Tuning kudu write performance

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.
Oh whoops, I didn't scroll down and missed that. Thanks!

Mauricio's suggestion is a good one. To that I would add: consider
increasing the number of hash buckets.

Additionally, what's the rest of the primary key look like? _key and
event_time are in there, but in what order? UUIDs in particular are usually
a poor choice for primary keys because of their random distribution, all
but guaranteeing lots of compaction during ingest, which slows down
throughput considerably. How bad it is depends on the arrangement of
columns in the primary key, and how that order reflects (or does not
reflect) the key order of incoming data.

On Wed, Nov 13, 2019 at 4:00 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> You should start by making sure each of your 3 hash partition tablets'
> leaders is in each of your 3 nodes.  Very well could be all 3 were in the
> same tablet server and you were ingesting into a single node.  If needed
> use leader_step_down
> <https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-leader_step_down> to
> move leaders around.
>
> FYI Adar, table schema was at bottom inside that iframe
>
> On Wed, Nov 13, 2019 at 3:24 PM Adar Lieber-Dembo <ad...@cloudera.com>
> wrote:
>
>> Some thoughts on how you might increase your write speed:
>> - Don't use the same disk for both WAL and data directories. If you
>> have enough disks, dedicate one for the WAL and the rest for data
>> directories.
>> - Since each disk is an SSD, experiment with a higher ratio of MM
>> threads to data directories. We typically recommend 1:3, but that's
>> for spinning disks. I see you've configured 2 MM threads for the
>> masters but are still using just 1 for the tservers? Consider using
>> 2-4.
>> - How is your schema structured? Are you using hash partitioning?
>> Range partitioning? Both? What's your primary key look like and does
>> incoming data arrive in sorted order (or mostly sorted order) w.r.t.
>> that key? Random order?
>> https://kudu.apache.org/docs/schema_design.html is an excellent
>> resource for understanding how schema can impact writes and reads.
>>
>> On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wx...@outlook.com>
>> wrote:
>> >
>> > Hi!
>> >
>> > I have some questions about kudu performance tuning.
>> >
>> > Kudu version: kudu 1.7.0-cdh5.16.2
>> >
>> > System memary pre node:256G
>> >
>> > 4 SSDs per machine:512G
>> >
>> > Three Master nodes and three Tserver nodes.
>> >
>> > // Master config
>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>> >
>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>> > --log_dir=/mnt/disk1/kudu/var/logs
>> > --master_addresses=xxxx
>> > --maintenance_manager_num_threads=2
>> > --block_cache_capacity_mb=6144
>> > --memory_limit_hard_bytes=34359738368
>> > --max_log_size=40
>> >
>> > // Tserver config
>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>> >
>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>> > --log_dir=/mnt/disk1/kudu/var/logs
>> > --tserver_master_addrs=xxxx
>> > --block_cache_capacity_mb=6144
>> > --memory_limit_hard_bytes=34359738368
>> > --max_log_size=40
>> >
>> > // Table schema
>> > // _key is UUID for each msg
>> > // event_time is data time
>> > // Schema has only 15 columns
>> > // Single message does not exceed 100Bytes
>> >
>> > HASH (_key) PARTITIONS 3,
>> > RANGE (event_time) (
>> >     PARTITION 2019-10-31T16:00:00.000000Z <= VALUES <
>> 2019-11-30T16:00:00.000000Z
>> > )
>> >
>> > I write a project to write data to kudu.
>> >
>> > Whether manual or automatic flush mode write speed is only 6MB/s.
>> >
>> > I think SSD should be more than this speed, and the network and memory
>> have not reached the bottleneck.
>> >
>> > Is this the normal level of kudu writing? How to tuning?
>> >
>> >
>> > Thanks.
>>
>
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactParTech/>
> <https://twitter.com/impactpartech>
> <https://www.youtube.com/c/impactmartech>
>

Re: Tuning kudu write performance

Posted by Mauricio Aristizabal <ma...@impact.com>.
You should start by making sure each of your 3 hash partition tablets'
leaders is in each of your 3 nodes.  Very well could be all 3 were in the
same tablet server and you were ingesting into a single node.  If needed
use leader_step_down
<https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-leader_step_down>
to
move leaders around.

FYI Adar, table schema was at bottom inside that iframe

On Wed, Nov 13, 2019 at 3:24 PM Adar Lieber-Dembo <ad...@cloudera.com> wrote:

> Some thoughts on how you might increase your write speed:
> - Don't use the same disk for both WAL and data directories. If you
> have enough disks, dedicate one for the WAL and the rest for data
> directories.
> - Since each disk is an SSD, experiment with a higher ratio of MM
> threads to data directories. We typically recommend 1:3, but that's
> for spinning disks. I see you've configured 2 MM threads for the
> masters but are still using just 1 for the tservers? Consider using
> 2-4.
> - How is your schema structured? Are you using hash partitioning?
> Range partitioning? Both? What's your primary key look like and does
> incoming data arrive in sorted order (or mostly sorted order) w.r.t.
> that key? Random order?
> https://kudu.apache.org/docs/schema_design.html is an excellent
> resource for understanding how schema can impact writes and reads.
>
> On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wx...@outlook.com> wrote:
> >
> > Hi!
> >
> > I have some questions about kudu performance tuning.
> >
> > Kudu version: kudu 1.7.0-cdh5.16.2
> >
> > System memary pre node:256G
> >
> > 4 SSDs per machine:512G
> >
> > Three Master nodes and three Tserver nodes.
> >
> > // Master config
> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
> >
> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
> > --log_dir=/mnt/disk1/kudu/var/logs
> > --master_addresses=xxxx
> > --maintenance_manager_num_threads=2
> > --block_cache_capacity_mb=6144
> > --memory_limit_hard_bytes=34359738368
> > --max_log_size=40
> >
> > // Tserver config
> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
> >
> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
> > --log_dir=/mnt/disk1/kudu/var/logs
> > --tserver_master_addrs=xxxx
> > --block_cache_capacity_mb=6144
> > --memory_limit_hard_bytes=34359738368
> > --max_log_size=40
> >
> > // Table schema
> > // _key is UUID for each msg
> > // event_time is data time
> > // Schema has only 15 columns
> > // Single message does not exceed 100Bytes
> >
> > HASH (_key) PARTITIONS 3,
> > RANGE (event_time) (
> >     PARTITION 2019-10-31T16:00:00.000000Z <= VALUES <
> 2019-11-30T16:00:00.000000Z
> > )
> >
> > I write a project to write data to kudu.
> >
> > Whether manual or automatic flush mode write speed is only 6MB/s.
> >
> > I think SSD should be more than this speed, and the network and memory
> have not reached the bottleneck.
> >
> > Is this the normal level of kudu writing? How to tuning?
> >
> >
> > Thanks.
>


-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactParTech/>
<https://twitter.com/impactpartech>
<https://www.youtube.com/c/impactmartech>

Re: Tuning kudu write performance

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.
Some thoughts on how you might increase your write speed:
- Don't use the same disk for both WAL and data directories. If you
have enough disks, dedicate one for the WAL and the rest for data
directories.
- Since each disk is an SSD, experiment with a higher ratio of MM
threads to data directories. We typically recommend 1:3, but that's
for spinning disks. I see you've configured 2 MM threads for the
masters but are still using just 1 for the tservers? Consider using
2-4.
- How is your schema structured? Are you using hash partitioning?
Range partitioning? Both? What's your primary key look like and does
incoming data arrive in sorted order (or mostly sorted order) w.r.t.
that key? Random order?
https://kudu.apache.org/docs/schema_design.html is an excellent
resource for understanding how schema can impact writes and reads.

On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wx...@outlook.com> wrote:
>
> Hi!
>
> I have some questions about kudu performance tuning.
>
> Kudu version: kudu 1.7.0-cdh5.16.2
>
> System memary pre node:256G
>
> 4 SSDs per machine:512G
>
> Three Master nodes and three Tserver nodes.
>
> // Master config
> --fs_wal_dir=/mnt/disk1/kudu/var/wal
> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
> --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
> --log_dir=/mnt/disk1/kudu/var/logs
> --master_addresses=xxxx
> --maintenance_manager_num_threads=2
> --block_cache_capacity_mb=6144
> --memory_limit_hard_bytes=34359738368
> --max_log_size=40
>
> // Tserver config
> --fs_wal_dir=/mnt/disk1/kudu/var/wal
> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
> --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
> --log_dir=/mnt/disk1/kudu/var/logs
> --tserver_master_addrs=xxxx
> --block_cache_capacity_mb=6144
> --memory_limit_hard_bytes=34359738368
> --max_log_size=40
>
> // Table schema
> // _key is UUID for each msg
> // event_time is data time
> // Schema has only 15 columns
> // Single message does not exceed 100Bytes
>
> HASH (_key) PARTITIONS 3,
> RANGE (event_time) (
>     PARTITION 2019-10-31T16:00:00.000000Z <= VALUES < 2019-11-30T16:00:00.000000Z
> )
>
> I write a project to write data to kudu.
>
> Whether manual or automatic flush mode write speed is only 6MB/s.
>
> I think SSD should be more than this speed, and the network and memory have not reached the bottleneck.
>
> Is this the normal level of kudu writing? How to tuning?
>
>
> Thanks.