You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kudu.apache.org by Mauricio Aristizabal <ma...@impact.com> on 2019/10/26 19:28:24 UTC

Balancing tablets per range partition

Hey guys,

We're having to do quite a bit of manual moving of replicas, and looking at
docs I don't see how we can configure better, or what tool to use, to avoid
this, even in the latest version (we're a bit behind on 1.7.0+cdh5.15.1+0
but about to upgrade to 1.10.0+cdh6.3.0):

Most of our tables have 1 range partition on some event date (usually per
month or day) AND a hash partition on a uuid or some other high cardinality
column.

Rebalancing a table ensures replicas are well spread around overall, but
not so for a given range partition.

Since about 99% of our ingestion, and 80% of our queries/scanning, is for
the current date range partition, what really matters is that the hash
partitions be well balanced per each date range partition.  How can we
ensure this?

Further, we mostly ingest via spark KuduContext and Impala insert/upsert,
so AFAIU ingest is into leader tablets....  how do we ensure that LEADERS
are balanced for a given range partition, without having to manually do a
bunch of leader_step_down s after the replicas have been balanced?

Thanks in advance!


PS: to illustrate this, here's a heatmap generated in Cloudera Manager,
showing 2 hotspotting nodes:

[image: image.png]

which was generated with chart query:

select
SUM(kudu_rows_inserted_rate+kudu_rows_updated_rate+kudu_rows_deleted_rate+kudu_rows_upserted_rate)
where serviceName=kudu and kuduTableName="impala::<kudutablename>"



-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactParTech/>
<https://twitter.com/impactpartech>
<https://www.youtube.com/c/impactmartech>

Re: Balancing tablets per range partition

Posted by Mauricio Aristizabal <ma...@impact.com>.

Thanks so much Alexey!

I think that the dimension label in KUDU-2823 is an adequate solution for
us for the foreseeable future.  We'll just start adding new monthly ranges
every couple of months, rather than a whole year in advance, to minimize
the effect of future TS commissionings/decommissings.  Just having the
replicas well balanced for a range gets us 90% of the way there... doing a
couple manual leader stepdowns right after adding them is not too onerous,
as it's a quick process and transparent to apps.

We'll just have to upgrade again as soon as 1.11 gets into CDH (hopefully
soon after it's released).  I doubt Impala will integrate this at first, so
we'll have to stop adding partitions via SQL ALTER TABLE.  However I'm
unclear just how we're supposed to do it instead: Xu Yao's last comment is
that Java client and/or CLI support still need to be added.

-m

On Sat, Oct 26, 2019 at 8:33 PM Alexey Serbin <as...@cloudera.com> wrote:

> Hi Mauricio,
>
> I think you are right: Kudu 1.10 and earlier doesn't support automatic
> balancing of replicas for given range partition.  With
> https://issues.apache.org/jira/browse/KUDU-2823 implemented in Kudu 1.11
> there will be a way to specify so-called dimension for newly added
> partition and have replicas of corresponding tablet to be evenly
> distributed.  Probably, it could help in your case, but it works only at
> the time of placing new tablet replicas.  There isn't a way yet to
> re-distribute already placed ones based on dimension: see
> https://issues.apache.org/jira/browse/KUDU-2974 for details.
>
> With safe/graceful leader stepdown implemented in Kudu 1.9 (
> https://issues.apache.org/jira/browse/KUDU-2245) things improved compared
> with prior versions, and maybe it's possible to implement a script that
> performs leader re-distribution in a reliable way.  However, automatic
> leader balancing is not there yet:
> https://issues.apache.org/jira/browse/KUDU-886.  Maybe, somebody else
> will chime in with some useful recipes for leader replica re-distribution.
>
>
> Kind regards,
>
> Alexey
>
> On Sat, Oct 26, 2019 at 12:28 PM Mauricio Aristizabal <ma...@impact.com>
> wrote:
>
>> Hey guys,
>>
>> We're having to do quite a bit of manual moving of replicas, and looking
>> at docs I don't see how we can configure better, or what tool to use, to
>> avoid this, even in the latest version (we're a bit behind
>> on 1.7.0+cdh5.15.1+0 but about to upgrade to 1.10.0+cdh6.3.0):
>>
>> Most of our tables have 1 range partition on some event date (usually per
>> month or day) AND a hash partition on a uuid or some other high cardinality
>> column.
>>
>> Rebalancing a table ensures replicas are well spread around overall, but
>> not so for a given range partition.
>>
>> Since about 99% of our ingestion, and 80% of our queries/scanning, is for
>> the current date range partition, what really matters is that the hash
>> partitions be well balanced per each date range partition.  How can we
>> ensure this?
>>
>> Further, we mostly ingest via spark KuduContext and Impala insert/upsert,
>> so AFAIU ingest is into leader tablets....  how do we ensure that LEADERS
>> are balanced for a given range partition, without having to manually do a
>> bunch of leader_step_down s after the replicas have been balanced?
>>
>> Thanks in advance!
>>
>>
>> PS: to illustrate this, here's a heatmap generated in Cloudera Manager,
>> showing 2 hotspotting nodes:
>>
>> [image: image.png]
>>
>> which was generated with chart query:
>>
>> select
>> SUM(kudu_rows_inserted_rate+kudu_rows_updated_rate+kudu_rows_deleted_rate+kudu_rows_upserted_rate)
>> where serviceName=kudu and kuduTableName="impala::<kudutablename>"
>>
>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauricio@impact.com | 323 309 4260
>> https://impact.com
>> <https://www.linkedin.com/company/impact-martech/>
>> <https://www.facebook.com/ImpactParTech/>
>> <https://twitter.com/impactpartech>
>> <https://www.youtube.com/c/impactmartech>
>>
>

-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactParTech/>
<https://twitter.com/impactpartech>
<https://www.youtube.com/c/impactmartech>

Re: Balancing tablets per range partition

Posted by Alexey Serbin <as...@cloudera.com>.

Hi Mauricio,

I think you are right: Kudu 1.10 and earlier doesn't support automatic
balancing of replicas for given range partition.  With
https://issues.apache.org/jira/browse/KUDU-2823 implemented in Kudu 1.11
there will be a way to specify so-called dimension for newly added
partition and have replicas of corresponding tablet to be evenly
distributed.  Probably, it could help in your case, but it works only at
the time of placing new tablet replicas.  There isn't a way yet to
re-distribute already placed ones based on dimension: see
https://issues.apache.org/jira/browse/KUDU-2974 for details.

With safe/graceful leader stepdown implemented in Kudu 1.9 (
https://issues.apache.org/jira/browse/KUDU-2245) things improved compared
with prior versions, and maybe it's possible to implement a script that
performs leader re-distribution in a reliable way.  However, automatic
leader balancing is not there yet:
https://issues.apache.org/jira/browse/KUDU-886.  Maybe, somebody else will
chime in with some useful recipes for leader replica re-distribution.

Kind regards,

Alexey

On Sat, Oct 26, 2019 at 12:28 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> Hey guys,
>
> We're having to do quite a bit of manual moving of replicas, and looking
> at docs I don't see how we can configure better, or what tool to use, to
> avoid this, even in the latest version (we're a bit behind
> on 1.7.0+cdh5.15.1+0 but about to upgrade to 1.10.0+cdh6.3.0):
>
> Most of our tables have 1 range partition on some event date (usually per
> month or day) AND a hash partition on a uuid or some other high cardinality
> column.
>
> Rebalancing a table ensures replicas are well spread around overall, but
> not so for a given range partition.
>
> Since about 99% of our ingestion, and 80% of our queries/scanning, is for
> the current date range partition, what really matters is that the hash
> partitions be well balanced per each date range partition.  How can we
> ensure this?
>
> Further, we mostly ingest via spark KuduContext and Impala insert/upsert,
> so AFAIU ingest is into leader tablets....  how do we ensure that LEADERS
> are balanced for a given range partition, without having to manually do a
> bunch of leader_step_down s after the replicas have been balanced?
>
> Thanks in advance!
>
>
> PS: to illustrate this, here's a heatmap generated in Cloudera Manager,
> showing 2 hotspotting nodes:
>
> [image: image.png]
>
> which was generated with chart query:
>
> select
> SUM(kudu_rows_inserted_rate+kudu_rows_updated_rate+kudu_rows_deleted_rate+kudu_rows_upserted_rate)
> where serviceName=kudu and kuduTableName="impala::<kudutablename>"
>
>
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactParTech/>
> <https://twitter.com/impactpartech>
> <https://www.youtube.com/c/impactmartech>
>