You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Lucas Capistrant <ca...@gmail.com> on 2020/11/24 15:17:39 UTC

Proposal Discussion Plug: #9816 Guild Replication

Hi all,

I have created a proposal, https://github.com/apache/druid/issues/9816,
that regards adding new functionality to the Druid segment replication
infrastructure. I wanted to share it in the dev list to try and get some
more eyes on it and drive discussion.  I won't repeat too much of what
already is described in the proposal. But the general idea is to add a new
logical grouping config to the Historical Servers that is specified at
runtime. If the operator chooses to use this new functionality, the
Coordinator will do its best to load replicants across 2+ of these new
historical groups. The motivating factors being increased quality of life
for cluster operators. Having best effort replication across groups will
allow opportunities for increased data availability (perhaps replicating
across physical racks in a datacenter to avoid unavailability due to switch
failure) as well as improved cluster operations work (being able to restart
a group of historicals knowing that the cluster has made best effort to not
have all replicants for a segment live within that group).

My proposal is based off of POC code that I have been working on. That POC
is linked in the proposal for people who want to look at the potential
implementation.

There was some discussion of folding this into druid tiering, but after
some analysis I came away thinking this would not be a wise choice. I think
the patterns and motivating factors behind tiers are too disconnected from
those of my proposal. And trying to rig up tiering to meet all of the
requirements that exist today plus the ones I propose, would result in a
cumbersome and confusing product.

I appreciate any and all feedback!

Thanks,
Lucas