You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "张铎 (Duo Zhang)" <pa...@gmail.com> on 2021/06/08 07:01:58 UTC

Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

Can not comment on the google doc so just reply here.

Recently I read a bit of Hudi and Iceberg, they both use files on the
filesystem for committing changes by default, so you can write to the data
directory directly without worrying about mess up the data files.

If no big concerns, I could start a implement a POC by going with the plain
file approach.

Thanks.

Josh Elser <el...@apache.org> 于2021年5月27日周四 上午3:21写道:

> Thanks Stack! (access given, as google probably told you already).
>
> Please keep me honest.
>
> On 5/26/21 12:29 PM, Stack wrote:
> > And, what is there currently is a nice write-up....
> > S
> >
> > On Wed, May 26, 2021 at 9:26 AM Stack <st...@duboce.net> wrote:
> >
> >> Can I have comment access please Josh?
> >> S
> >>
> >> On Tue, May 25, 2021 at 8:24 PM Josh Elser <el...@apache.org> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> This is a follow-on for the HBASE-24749 discussion on storefile
> >>> tracking, specifically focusing on where/how do we store the list of
> >>> files for each Store.
> >>>
> >>> I tried to capture my thoughts and the suggestions by Duo and
> Wellington
> >>> in this google doc [1].
> >>>
> >>> Please feel free to ask for edit permission (and send me a note if your
> >>> email address isn't one that I would otherwise recognize :) ) to
> >>> correct, improve, or expand on any other sections.
> >>>
> >>> FWIW, I was initially not super excited about a per-Store file, but,
> the
> >>> more I think about it, the more I'm coming around to that idea. I think
> >>> it will be more "exception-handling", but avoid the long-term
> >>> operational burden of yet-another-important-system-table.
> >>>
> >>> - Josh
> >>>
> >>> [1]
> >>>
> >>>
> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
> >>>
> >>
> >
>

Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
I've filed HBASE-26064 for implementing a general framework to make the
store file tracking way pluggable.

The design doc is here:

https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.l9m8dadaxzee

张铎(Duo Zhang) <pa...@gmail.com> 于2021年6月9日周三 下午11:57写道:

> Filed HBASE-25988.
>
> And also posted a simple design doc there.
>
>
> https://docs.google.com/document/d/1Ao_jwa3ekXghAKXVpO7fbX_BXrTv3-RgxsneNULRXx0/edit?usp=sharing
>
> Wellington Chevreuil <we...@gmail.com> 于2021年6月9日周三
> 下午6:28写道:
>
>> I say go for it. Can you create it as a subtask under the
>> umbrella HBASE-24749 and PR to that feature branch? I had been working on
>> the tasks related to avoid "creating files in temp then rename" for
>> flushes
>> and compactions (HBASE-25391 and HBASE-25392), had managed to successfully
>> run it with the PersistedStoreEngine from HBASE-25395.
>>
>> Em ter., 8 de jun. de 2021 às 08:02, 张铎(Duo Zhang) <palomino219@gmail.com
>> >
>> escreveu:
>>
>> > Can not comment on the google doc so just reply here.
>> >
>> > Recently I read a bit of Hudi and Iceberg, they both use files on the
>> > filesystem for committing changes by default, so you can write to the
>> data
>> > directory directly without worrying about mess up the data files.
>> >
>> > If no big concerns, I could start a implement a POC by going with the
>> plain
>> > file approach.
>> >
>> > Thanks.
>> >
>> > Josh Elser <el...@apache.org> 于2021年5月27日周四 上午3:21写道:
>> >
>> > > Thanks Stack! (access given, as google probably told you already).
>> > >
>> > > Please keep me honest.
>> > >
>> > > On 5/26/21 12:29 PM, Stack wrote:
>> > > > And, what is there currently is a nice write-up....
>> > > > S
>> > > >
>> > > > On Wed, May 26, 2021 at 9:26 AM Stack <st...@duboce.net> wrote:
>> > > >
>> > > >> Can I have comment access please Josh?
>> > > >> S
>> > > >>
>> > > >> On Tue, May 25, 2021 at 8:24 PM Josh Elser <el...@apache.org>
>> wrote:
>> > > >>
>> > > >>> Hi folks,
>> > > >>>
>> > > >>> This is a follow-on for the HBASE-24749 discussion on storefile
>> > > >>> tracking, specifically focusing on where/how do we store the list
>> of
>> > > >>> files for each Store.
>> > > >>>
>> > > >>> I tried to capture my thoughts and the suggestions by Duo and
>> > > Wellington
>> > > >>> in this google doc [1].
>> > > >>>
>> > > >>> Please feel free to ask for edit permission (and send me a note if
>> > your
>> > > >>> email address isn't one that I would otherwise recognize :) ) to
>> > > >>> correct, improve, or expand on any other sections.
>> > > >>>
>> > > >>> FWIW, I was initially not super excited about a per-Store file,
>> but,
>> > > the
>> > > >>> more I think about it, the more I'm coming around to that idea. I
>> > think
>> > > >>> it will be more "exception-handling", but avoid the long-term
>> > > >>> operational burden of yet-another-important-system-table.
>> > > >>>
>> > > >>> - Josh
>> > > >>>
>> > > >>> [1]
>> > > >>>
>> > > >>>
>> > >
>> >
>> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Filed HBASE-25988.

And also posted a simple design doc there.

https://docs.google.com/document/d/1Ao_jwa3ekXghAKXVpO7fbX_BXrTv3-RgxsneNULRXx0/edit?usp=sharing

Wellington Chevreuil <we...@gmail.com> 于2021年6月9日周三 下午6:28写道:

> I say go for it. Can you create it as a subtask under the
> umbrella HBASE-24749 and PR to that feature branch? I had been working on
> the tasks related to avoid "creating files in temp then rename" for flushes
> and compactions (HBASE-25391 and HBASE-25392), had managed to successfully
> run it with the PersistedStoreEngine from HBASE-25395.
>
> Em ter., 8 de jun. de 2021 às 08:02, 张铎(Duo Zhang) <pa...@gmail.com>
> escreveu:
>
> > Can not comment on the google doc so just reply here.
> >
> > Recently I read a bit of Hudi and Iceberg, they both use files on the
> > filesystem for committing changes by default, so you can write to the
> data
> > directory directly without worrying about mess up the data files.
> >
> > If no big concerns, I could start a implement a POC by going with the
> plain
> > file approach.
> >
> > Thanks.
> >
> > Josh Elser <el...@apache.org> 于2021年5月27日周四 上午3:21写道:
> >
> > > Thanks Stack! (access given, as google probably told you already).
> > >
> > > Please keep me honest.
> > >
> > > On 5/26/21 12:29 PM, Stack wrote:
> > > > And, what is there currently is a nice write-up....
> > > > S
> > > >
> > > > On Wed, May 26, 2021 at 9:26 AM Stack <st...@duboce.net> wrote:
> > > >
> > > >> Can I have comment access please Josh?
> > > >> S
> > > >>
> > > >> On Tue, May 25, 2021 at 8:24 PM Josh Elser <el...@apache.org>
> wrote:
> > > >>
> > > >>> Hi folks,
> > > >>>
> > > >>> This is a follow-on for the HBASE-24749 discussion on storefile
> > > >>> tracking, specifically focusing on where/how do we store the list
> of
> > > >>> files for each Store.
> > > >>>
> > > >>> I tried to capture my thoughts and the suggestions by Duo and
> > > Wellington
> > > >>> in this google doc [1].
> > > >>>
> > > >>> Please feel free to ask for edit permission (and send me a note if
> > your
> > > >>> email address isn't one that I would otherwise recognize :) ) to
> > > >>> correct, improve, or expand on any other sections.
> > > >>>
> > > >>> FWIW, I was initially not super excited about a per-Store file,
> but,
> > > the
> > > >>> more I think about it, the more I'm coming around to that idea. I
> > think
> > > >>> it will be more "exception-handling", but avoid the long-term
> > > >>> operational burden of yet-another-important-system-table.
> > > >>>
> > > >>> - Josh
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

Posted by Wellington Chevreuil <we...@gmail.com>.
I say go for it. Can you create it as a subtask under the
umbrella HBASE-24749 and PR to that feature branch? I had been working on
the tasks related to avoid "creating files in temp then rename" for flushes
and compactions (HBASE-25391 and HBASE-25392), had managed to successfully
run it with the PersistedStoreEngine from HBASE-25395.

Em ter., 8 de jun. de 2021 às 08:02, 张铎(Duo Zhang) <pa...@gmail.com>
escreveu:

> Can not comment on the google doc so just reply here.
>
> Recently I read a bit of Hudi and Iceberg, they both use files on the
> filesystem for committing changes by default, so you can write to the data
> directory directly without worrying about mess up the data files.
>
> If no big concerns, I could start a implement a POC by going with the plain
> file approach.
>
> Thanks.
>
> Josh Elser <el...@apache.org> 于2021年5月27日周四 上午3:21写道:
>
> > Thanks Stack! (access given, as google probably told you already).
> >
> > Please keep me honest.
> >
> > On 5/26/21 12:29 PM, Stack wrote:
> > > And, what is there currently is a nice write-up....
> > > S
> > >
> > > On Wed, May 26, 2021 at 9:26 AM Stack <st...@duboce.net> wrote:
> > >
> > >> Can I have comment access please Josh?
> > >> S
> > >>
> > >> On Tue, May 25, 2021 at 8:24 PM Josh Elser <el...@apache.org> wrote:
> > >>
> > >>> Hi folks,
> > >>>
> > >>> This is a follow-on for the HBASE-24749 discussion on storefile
> > >>> tracking, specifically focusing on where/how do we store the list of
> > >>> files for each Store.
> > >>>
> > >>> I tried to capture my thoughts and the suggestions by Duo and
> > Wellington
> > >>> in this google doc [1].
> > >>>
> > >>> Please feel free to ask for edit permission (and send me a note if
> your
> > >>> email address isn't one that I would otherwise recognize :) ) to
> > >>> correct, improve, or expand on any other sections.
> > >>>
> > >>> FWIW, I was initially not super excited about a per-Store file, but,
> > the
> > >>> more I think about it, the more I'm coming around to that idea. I
> think
> > >>> it will be more "exception-handling", but avoid the long-term
> > >>> operational burden of yet-another-important-system-table.
> > >>>
> > >>> - Josh
> > >>>
> > >>> [1]
> > >>>
> > >>>
> >
> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
> > >>>
> > >>
> > >
> >
>