You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Nicolas Fouché <nf...@onfocus.io> on 2017/02/28 11:29:43 UTC

Recommended bulk size ?

Hi. Is there any recommendation on the number of operations in bulk/AUTO_FLUSH_BACKGROUND ? I guess it highly depends on the cluster size, the number of partitions hit by the operations, etc. But there could be some guidelines out there ?




Looking at the code of the kudu client, it seems that the default size is 1000: `private int mutationBufferSpace = 1000;`.




- Nicolas



Re: Recommended bulk size ?

Posted by Paul Brannan <pa...@thesystech.com>.
As you said, I expect it depends on many variables.  I ran a quick & dirty
experiment when first evaluating kudu 1.0 to see how flushing at varying
intervals affected insert rates.  I had one master and one tserver, each in
the default configuration, on an ext4 filesystem on a spinning disk.  The
table had two string columns "key" and "value", both part of the primary
key, each less than 30 bytes.  Here were the results:

Manual flush every insert: 100K inserts in 14.5s (~7K/s)
Manual flush every 100K: 1M inserts in 4.7s (~215K/s, w/ warnings about
"blocked reactor thread")
Manual flush every 10K: 1M inserts in 4.2s (~240K/s)
Auto flush background, no explicit flush: 1M inserts in 4.8s (w/ warnings
about "blocked reactor thread" and "thread stuck")
Auto flush background, explicit flush every 10K inserts: 1M inserts in 4.2s
(~240K/s)
Async flush every 10K inserts: 1M inserts in 2.8s (~350K/s)
Async flush every 1K inserts: 1M inserts in 2.7s (~370K/s)
Async flush every 100: 1M inserts in 3.3s (~300K/s)
Async flush every 10: 1M inserts in 10.6s (~95K/s)

Based on this experiment, I chose async flush with a 1K interval, because
beyond that there is diminishing return, and I don't want to run out of
mutation space.


On Tue, Feb 28, 2017 at 6:29 AM, Nicolas Fouché <nf...@onfocus.io> wrote:

> Hi. Is there any recommendation on the number of operations in
> bulk/AUTO_FLUSH_BACKGROUND ? I guess it highly depends on the cluster size,
> the number of partitions hit by the operations, etc. But there could be
> some guidelines out there ?
>
>
> Looking at the code of the kudu client, it seems that the default size is
> 1000: `private int mutationBufferSpace = 1000;`.
>
> - Nicolas
>