You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Ksenya Leonova <ks...@gmail.com> on 2018/04/10 10:11:34 UTC

Kudu deployment best practice

Hello!

We are going to use Kudu in our research platform, so there are several
open questions. Could you share with us your experience in solving problems
described below?

1) Best practice in Kudu deployment:
it is planned to use Kudu in conjunction with HDFS, so how do you usually
solve the problem of sharing and flexible resource management between Kudu
and HDFS?

2) Mixed workload processing (OLAP & OLTP):
in the case of mixed type of load has anybody faced with the problem of
performance decrease? (slowdown of disk I/O and etc)

We have 20 nodes in cluster. Each node has 10 x 8 TB 7,200 RPM High
Capacity SAS Drives for data storage (HDFS now) and 2x SSD for OS.

Thank in advance!

Best regards,
Ksenia Leonova

Re: Kudu deployment best practice

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.
On Tue, Apr 10, 2018 at 3:11 AM, Ksenya Leonova <ks...@gmail.com> wrote:
>
> 1) Best practice in Kudu deployment:
> it is planned to use Kudu in conjunction with HDFS, so how do you usually
> solve the problem of sharing and flexible resource management between Kudu
> and HDFS?

At least for now, resource management is outside the purview of Kudu,
and probably HDFS too, though I don't know for certain. I know of some
deployments that use Linux cgroups for RM; maybe that will work for
you? Or if you've already got a resource management application
deployed such as Kubernetes or YARN, you could use that.

> 2) Mixed workload processing (OLAP & OLTP):
> in the case of mixed type of load has anybody faced with the problem of
> performance decrease? (slowdown of disk I/O and etc)
>
> We have 20 nodes in cluster. Each node has 10 x 8 TB 7,200 RPM High Capacity
> SAS Drives for data storage (HDFS now) and 2x SSD for OS.

Can you be more specific? What exactly do you mean by a "mixed type of load"?

In any case, Kudu write performance does degrade as tablets increase
in size, because to apply a row operation Kudu must check if that
row's primary key exists, and when your working set exceeds the size
of the Kudu block cache and the host's page cache, bloom and index
lookups for a primary key are likely to cause cache misses and require
additional disk access.