You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by leesf <le...@gmail.com> on 2021/12/12 11:12:16 UTC

Re: [DISCUSS] Propose Consistent Hashing Indexing for Dynamic Bucket Number

+1 for the improvement to make bucket index more comprehensive and looking
forward to the RFC for more details.

Yuwei Xiao <yw...@gmail.com> 于2021年12月10日周五 16:22写道：

> Dear Hudi Community,
>
> I would like to propose Consistent Hashing Indexing to enable dynamic
> bucket number, saving hyper-parameter tuning for Hudi users.
>
> Currently, we have Bucket Index on landing [1]. It is an effective index
> approach to address the performance issue during Upsert. I observed ~3x
> throughput improvement for Upsert in my local setup compared to the Bloom
> Filter approach. However, it requires pre-configure a bucket number when
> creating the table. As described in [1], this imposes two limitations:
>
> - Due to the one-one mapping between buckets and file groups, the size of a
> single file group may grow infinitely. Services like compaction will take
> longer because of the larger read/write amplification.
>
> - There may exist data skew because of imbalance data distribution,
> resulting in long-tail read/write.
>
> Based on the above observation, supporting dynamic bucket number is
> necessary, especially for rapidly changing hudi tables. Looking at the
> market, Consistent Hashing has been adopted in DB systems[2][3]. The main
> idea of it is to turn the "key->bucket" mapping into
> "key->hash_value->(range mapping)->bucket", constraining the re-hashing
> process to touch only several local buckets (e.g., only large file groups)
> rather than shuffling the whole hash table.
>
> In order to introduce Consistent Hashing to Hudi, we need to consider the
> following issues:
>
> - Storing hashing metadata, such as range mapping infos. Metadata size and
> concurrent updates to metadata should also be considered.
>
> - Splitting & Merging criteria. We need to design a (or several) policies
> to manage 'when and how to split & merge bucket'. A simple policy would be
> splitting in the middle when the file group reaches the size threshold.
>
> - Supporting concurrent write & read. The splitting or merging must not
> block concurrent writer & reader, and the whole process should be fast
> enough (e.g., one bucket at a time) to minimize the impact on other
> operations.
>
> - Integrating splitting & merging process into existing hudi table service
> pipelines.
>
> I have sketched a prototype design to address the above problems:
>
> - Maintain hashing metadata for each partition (persisted as files), and
> use instant to manage multi-version and concurrent updates of it.
>
> - A flexible framework will be implemented for different pluggable
> policies. The splitting plan, specifying which and how the bucket to split
> (merge), will be generated during the scheduling (just like how compaction
> does).
>
> - Dual-write will be activated once the writer observes the splitting(or
> merging) process, upserting records as log files into both old and new
> buckets (file groups). Readers can see records once the writer completes,
> regardless of the splitting process.
>
> - The splitting & merging could be integrated as a sub-task into the
> Clustering service, because we could view them as a special case of the
> Clustering's goal (i.e., managing file groups based on file size). Though
> we need to modify Clustering to handle log files, the bucket index enhances
> Clustering by allowing concurrent updates.
>
>
> Would love to hear your thoughts and any feedback about the proposal. I can
> draft an RFC with a detailed design once we reach an agreement.
>
> [1]
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
>
> [2] YugabyteDB
>
> https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example
>
> [3] PolarDB-X
> https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws
>
>
>
> Best,
>
> Yuwei Xiao
>

Re: [DISCUSS] Propose Consistent Hashing Indexing for Dynamic Bucket Number

Posted by Yuwei Xiao <yw...@gmail.com>.

The RFC pr link :)
https://github.com/apache/hudi/pull/4326

I am personally inclined to add a new index option (DYANMIC_BUCKET_INDEX),
to keep a clean & performable hash index (BUCEKT_INDEX) option for
experienced users.
We could also save potential migration trouble by ensuring a consistent
behavior of the hash index.

The impact of resizing (split/merge) has been described in the RFC. In
short, the split/merge is async embedded in the clustering process and
doesn't block concurrent readers & writers. By controlling its processing
granularity, we can further alleviate the performance impact on concurrent
operations.

I agree that merge (shrink table) may be a very infrequent operation. But I
guess we still need to implement it for completeness :)

On Tue, Dec 14, 2021 at 1:52 AM Vinoth Chandar <vi...@apache.org> wrote:

> +1 on the overall idea.
>
> I am wondering if we can layer this on top of Hash Index as a way for just
> expanding the number of buckets.
>
> While Split/Merge sounds great, IMO there is significant operational
> overhead to it. Most practical scenarios can be met with ability to expand
> with zero impact as you describe it?
> In fact, back when I worked on voldemort (linkedin's dynamo impl), we never
> shrunk the tables for this reason as well.
>
> In any case, look forward to the RFC. please grab a RFC number!
>
> On Mon, Dec 13, 2021 at 6:24 AM Gary Li <ga...@apache.org> wrote:
>
> > +1, looking forward to the RFC.
> >
> > Best,
> > Gary
> >
> > On Sun, Dec 12, 2021 at 7:12 PM leesf <le...@gmail.com> wrote:
> >
> > > +1 for the improvement to make bucket index more comprehensive and
> > looking
> > > forward to the RFC for more details.
> > >
> > > Yuwei Xiao <yw...@gmail.com> 于2021年12月10日周五 16:22写道：
> > >
> > > > Dear Hudi Community,
> > > >
> > > > I would like to propose Consistent Hashing Indexing to enable dynamic
> > > > bucket number, saving hyper-parameter tuning for Hudi users.
> > > >
> > > > Currently, we have Bucket Index on landing [1]. It is an effective
> > index
> > > > approach to address the performance issue during Upsert. I observed
> ~3x
> > > > throughput improvement for Upsert in my local setup compared to the
> > Bloom
> > > > Filter approach. However, it requires pre-configure a bucket number
> > when
> > > > creating the table. As described in [1], this imposes two
> limitations:
> > > >
> > > > - Due to the one-one mapping between buckets and file groups, the
> size
> > > of a
> > > > single file group may grow infinitely. Services like compaction will
> > take
> > > > longer because of the larger read/write amplification.
> > > >
> > > > - There may exist data skew because of imbalance data distribution,
> > > > resulting in long-tail read/write.
> > > >
> > > > Based on the above observation, supporting dynamic bucket number is
> > > > necessary, especially for rapidly changing hudi tables. Looking at
> the
> > > > market, Consistent Hashing has been adopted in DB systems[2][3]. The
> > main
> > > > idea of it is to turn the "key->bucket" mapping into
> > > > "key->hash_value->(range mapping)->bucket", constraining the
> re-hashing
> > > > process to touch only several local buckets (e.g., only large file
> > > groups)
> > > > rather than shuffling the whole hash table.
> > > >
> > > > In order to introduce Consistent Hashing to Hudi, we need to consider
> > the
> > > > following issues:
> > > >
> > > > - Storing hashing metadata, such as range mapping infos. Metadata
> size
> > > and
> > > > concurrent updates to metadata should also be considered.
> > > >
> > > > - Splitting & Merging criteria. We need to design a (or several)
> > policies
> > > > to manage 'when and how to split & merge bucket'. A simple policy
> would
> > > be
> > > > splitting in the middle when the file group reaches the size
> threshold.
> > > >
> > > > - Supporting concurrent write & read. The splitting or merging must
> not
> > > > block concurrent writer & reader, and the whole process should be
> fast
> > > > enough (e.g., one bucket at a time) to minimize the impact on other
> > > > operations.
> > > >
> > > > - Integrating splitting & merging process into existing hudi table
> > > service
> > > > pipelines.
> > > >
> > > > I have sketched a prototype design to address the above problems:
> > > >
> > > > - Maintain hashing metadata for each partition (persisted as files),
> > and
> > > > use instant to manage multi-version and concurrent updates of it.
> > > >
> > > > - A flexible framework will be implemented for different pluggable
> > > > policies. The splitting plan, specifying which and how the bucket to
> > > split
> > > > (merge), will be generated during the scheduling (just like how
> > > compaction
> > > > does).
> > > >
> > > > - Dual-write will be activated once the writer observes the
> > splitting(or
> > > > merging) process, upserting records as log files into both old and
> new
> > > > buckets (file groups). Readers can see records once the writer
> > completes,
> > > > regardless of the splitting process.
> > > >
> > > > - The splitting & merging could be integrated as a sub-task into the
> > > > Clustering service, because we could view them as a special case of
> the
> > > > Clustering's goal (i.e., managing file groups based on file size).
> > Though
> > > > we need to modify Clustering to handle log files, the bucket index
> > > enhances
> > > > Clustering by allowing concurrent updates.
> > > >
> > > >
> > > > Would love to hear your thoughts and any feedback about the
> proposal. I
> > > can
> > > > draft an RFC with a detailed design once we reach an agreement.
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
> > > >
> > > > [2] YugabyteDB
> > > >
> > > >
> > >
> >
> https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example
> > > >
> > > > [3] PolarDB-X
> > > >
> https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Yuwei Xiao
> > > >
> > >
> >
>

Re: [DISCUSS] Propose Consistent Hashing Indexing for Dynamic Bucket Number

Posted by Vinoth Chandar <vi...@apache.org>.

+1 on the overall idea.

I am wondering if we can layer this on top of Hash Index as a way for just
expanding the number of buckets.

While Split/Merge sounds great, IMO there is significant operational
overhead to it. Most practical scenarios can be met with ability to expand
with zero impact as you describe it?
In fact, back when I worked on voldemort (linkedin's dynamo impl), we never
shrunk the tables for this reason as well.

In any case, look forward to the RFC. please grab a RFC number!

On Mon, Dec 13, 2021 at 6:24 AM Gary Li <ga...@apache.org> wrote:

> +1, looking forward to the RFC.
>
> Best,
> Gary
>
> On Sun, Dec 12, 2021 at 7:12 PM leesf <le...@gmail.com> wrote:
>
> > +1 for the improvement to make bucket index more comprehensive and
> looking
> > forward to the RFC for more details.
> >
> > Yuwei Xiao <yw...@gmail.com> 于2021年12月10日周五 16:22写道：
> >
> > > Dear Hudi Community,
> > >
> > > I would like to propose Consistent Hashing Indexing to enable dynamic
> > > bucket number, saving hyper-parameter tuning for Hudi users.
> > >
> > > Currently, we have Bucket Index on landing [1]. It is an effective
> index
> > > approach to address the performance issue during Upsert. I observed ~3x
> > > throughput improvement for Upsert in my local setup compared to the
> Bloom
> > > Filter approach. However, it requires pre-configure a bucket number
> when
> > > creating the table. As described in [1], this imposes two limitations:
> > >
> > > - Due to the one-one mapping between buckets and file groups, the size
> > of a
> > > single file group may grow infinitely. Services like compaction will
> take
> > > longer because of the larger read/write amplification.
> > >
> > > - There may exist data skew because of imbalance data distribution,
> > > resulting in long-tail read/write.
> > >
> > > Based on the above observation, supporting dynamic bucket number is
> > > necessary, especially for rapidly changing hudi tables. Looking at the
> > > market, Consistent Hashing has been adopted in DB systems[2][3]. The
> main
> > > idea of it is to turn the "key->bucket" mapping into
> > > "key->hash_value->(range mapping)->bucket", constraining the re-hashing
> > > process to touch only several local buckets (e.g., only large file
> > groups)
> > > rather than shuffling the whole hash table.
> > >
> > > In order to introduce Consistent Hashing to Hudi, we need to consider
> the
> > > following issues:
> > >
> > > - Storing hashing metadata, such as range mapping infos. Metadata size
> > and
> > > concurrent updates to metadata should also be considered.
> > >
> > > - Splitting & Merging criteria. We need to design a (or several)
> policies
> > > to manage 'when and how to split & merge bucket'. A simple policy would
> > be
> > > splitting in the middle when the file group reaches the size threshold.
> > >
> > > - Supporting concurrent write & read. The splitting or merging must not
> > > block concurrent writer & reader, and the whole process should be fast
> > > enough (e.g., one bucket at a time) to minimize the impact on other
> > > operations.
> > >
> > > - Integrating splitting & merging process into existing hudi table
> > service
> > > pipelines.
> > >
> > > I have sketched a prototype design to address the above problems:
> > >
> > > - Maintain hashing metadata for each partition (persisted as files),
> and
> > > use instant to manage multi-version and concurrent updates of it.
> > >
> > > - A flexible framework will be implemented for different pluggable
> > > policies. The splitting plan, specifying which and how the bucket to
> > split
> > > (merge), will be generated during the scheduling (just like how
> > compaction
> > > does).
> > >
> > > - Dual-write will be activated once the writer observes the
> splitting(or
> > > merging) process, upserting records as log files into both old and new
> > > buckets (file groups). Readers can see records once the writer
> completes,
> > > regardless of the splitting process.
> > >
> > > - The splitting & merging could be integrated as a sub-task into the
> > > Clustering service, because we could view them as a special case of the
> > > Clustering's goal (i.e., managing file groups based on file size).
> Though
> > > we need to modify Clustering to handle log files, the bucket index
> > enhances
> > > Clustering by allowing concurrent updates.
> > >
> > >
> > > Would love to hear your thoughts and any feedback about the proposal. I
> > can
> > > draft an RFC with a detailed design once we reach an agreement.
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
> > >
> > > [2] YugabyteDB
> > >
> > >
> >
> https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example
> > >
> > > [3] PolarDB-X
> > > https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws
> > >
> > >
> > >
> > > Best,
> > >
> > > Yuwei Xiao
> > >
> >
>

Re: [DISCUSS] Propose Consistent Hashing Indexing for Dynamic Bucket Number

Posted by Gary Li <ga...@apache.org>.

+1, looking forward to the RFC.

Best,
Gary

On Sun, Dec 12, 2021 at 7:12 PM leesf <le...@gmail.com> wrote:

> +1 for the improvement to make bucket index more comprehensive and looking
> forward to the RFC for more details.
>
> Yuwei Xiao <yw...@gmail.com> 于2021年12月10日周五 16:22写道：
>
> > Dear Hudi Community,
> >
> > I would like to propose Consistent Hashing Indexing to enable dynamic
> > bucket number, saving hyper-parameter tuning for Hudi users.
> >
> > Currently, we have Bucket Index on landing [1]. It is an effective index
> > approach to address the performance issue during Upsert. I observed ~3x
> > throughput improvement for Upsert in my local setup compared to the Bloom
> > Filter approach. However, it requires pre-configure a bucket number when
> > creating the table. As described in [1], this imposes two limitations:
> >
> > - Due to the one-one mapping between buckets and file groups, the size
> of a
> > single file group may grow infinitely. Services like compaction will take
> > longer because of the larger read/write amplification.
> >
> > - There may exist data skew because of imbalance data distribution,
> > resulting in long-tail read/write.
> >
> > Based on the above observation, supporting dynamic bucket number is
> > necessary, especially for rapidly changing hudi tables. Looking at the
> > market, Consistent Hashing has been adopted in DB systems[2][3]. The main
> > idea of it is to turn the "key->bucket" mapping into
> > "key->hash_value->(range mapping)->bucket", constraining the re-hashing
> > process to touch only several local buckets (e.g., only large file
> groups)
> > rather than shuffling the whole hash table.
> >
> > In order to introduce Consistent Hashing to Hudi, we need to consider the
> > following issues:
> >
> > - Storing hashing metadata, such as range mapping infos. Metadata size
> and
> > concurrent updates to metadata should also be considered.
> >
> > - Splitting & Merging criteria. We need to design a (or several) policies
> > to manage 'when and how to split & merge bucket'. A simple policy would
> be
> > splitting in the middle when the file group reaches the size threshold.
> >
> > - Supporting concurrent write & read. The splitting or merging must not
> > block concurrent writer & reader, and the whole process should be fast
> > enough (e.g., one bucket at a time) to minimize the impact on other
> > operations.
> >
> > - Integrating splitting & merging process into existing hudi table
> service
> > pipelines.
> >
> > I have sketched a prototype design to address the above problems:
> >
> > - Maintain hashing metadata for each partition (persisted as files), and
> > use instant to manage multi-version and concurrent updates of it.
> >
> > - A flexible framework will be implemented for different pluggable
> > policies. The splitting plan, specifying which and how the bucket to
> split
> > (merge), will be generated during the scheduling (just like how
> compaction
> > does).
> >
> > - Dual-write will be activated once the writer observes the splitting(or
> > merging) process, upserting records as log files into both old and new
> > buckets (file groups). Readers can see records once the writer completes,
> > regardless of the splitting process.
> >
> > - The splitting & merging could be integrated as a sub-task into the
> > Clustering service, because we could view them as a special case of the
> > Clustering's goal (i.e., managing file groups based on file size). Though
> > we need to modify Clustering to handle log files, the bucket index
> enhances
> > Clustering by allowing concurrent updates.
> >
> >
> > Would love to hear your thoughts and any feedback about the proposal. I
> can
> > draft an RFC with a detailed design once we reach an agreement.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
> >
> > [2] YugabyteDB
> >
> >
> https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example
> >
> > [3] PolarDB-X
> > https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws
> >
> >
> >
> > Best,
> >
> > Yuwei Xiao
> >
>