You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kudu.apache.org by "fengbaoli@uce.cn" <fe...@uce.cn> on 2018/04/04 02:38:00 UTC

question about kudu performance!

(1)version
The component version is:
   - CDH 5.14
   - Kudu 1.6(2)FrameworkThe size of the kudu cluster(total 10 machines,256G mem,16*1.2T sas disk)：-3 master node-7 tablet server nodeIndependent deployment of master and tablet server，but Yarn nodemanger and tablet server are deployed together (3) kudu parameter :maintenance_manager_num_threads=24(The number of data directories for each machine disk is 8)memory_limit_hard_bytes=150GI have a performance problem: every 2-3 weeks, clusters start to make MajorDeltaCompactionOp, when kudu performs insert and update performance decreases, when the data is written, the update operation almost stops.Is it possible to adjust the --memory_limit_hard_bytes parameter to 256G*80%(my yarn nm and TS are deployed together)?Can we adjust the parameter --tablet_history_max_age_sec to shorten the MajorDeltaCompactionOp interval?Can you give me some suggestions to optimize this performance problem?thanks!




优速物流有限公司
大数据中心     冯宝利
Mobil：15050552430
Email：fengbaoli@uce.cn

Re: question about kudu performance!

Posted by Todd Lipcon <to...@cloudera.com>.

> On Tue, Apr 3, 2018 at 7:38 PM, fengbaoli@uce.cn <fe...@uce.cn> wrote:
> >
> > (1)version
> > The component version is:
> >
> >    - CDH 5.14
> >    - Kudu 1.6
> >
> > (2)Framework
> >
> > The size of the kudu cluster(total 10 machines,256G mem,16*1.2T sas disk)：
> >
> > -3 master node
> >
> > -7 tablet server node
> >
> > Independent deployment of master and tablet server，but Yarn nodemanger and tablet server are deployed together
> >
> >  (3) kudu parameter :
> >
> > maintenance_manager_num_threads=24(The number of data directories for each machine disk is 8)

If you have 16 disks, why only 8 directories?

I would recommend reducing this significantly. We usually recommend
one thread for every 3 disks.

> >
> > memory_limit_hard_bytes=150G
> >
> > I have a performance problem: every 2-3 weeks, clusters start to make MajorDeltaCompactionOp, when kudu performs insert and update performance decreases, when the data is written, the update operation almost stops.
>

The major delta compactions should actually improve update
performance, not decrease it. Do you have any more detailed metrics to
explain the performance drop?

If you upgrade to Kudu 1.7 the tservers will start to produce a
diagnostics log. If you can send a diagnostics log segment from the
point in time when the performance problem is occurring we can try to
understand this behavior better.

>
> > Is it possible to adjust the --memory_limit_hard_bytes parameter to 256G*80%(my yarn nm and TS are deployed together)?

If YARN is also scheduling work on these nodes, then you may end up
swapping and that would really kill performance. I usually don't see
improvements in Kudu performance by providing such huge amounts of
memory. The one exception would be that you might get some improvement
using a large block cache if your existing cache is showing a low hit
rate. The metrics would help determine that.

> >
> > Can we adjust the parameter --tablet_history_max_age_sec to shorten the MajorDeltaCompactionOp interval?

Nope, that won't affect the major delta compaction frequency. The one
undocumented tunable that is relevant is
--tablet_delta_store_major_compact_min_ratio (default 0.1). Raising
this would decrease the frequency of major delta compaction, but I
think there is likely something else going on here.

-Todd

> >
> > Can you give me some suggestions to optimize this performance problem?

Usually the best way to improve performance is by thinking carefully
about schema design, partitioning, and workload, rather than tuning
configuration. Maybe you can share more about your workload, schema,
and partitioning.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera