You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by 冯健 <fe...@gmail.com> on 2022/08/21 17:58:37 UTC

[DISCUSS] New RFC to support 'Snapshot view management'

Hi team,
[image: image.png]
    for the snapshot view scenario, Hudi already provides two key features
to support it:

   - Time travel: user provides a timestamp to query a specific snapshot
   view of a Hudi table
   - Savepoint/restore: "savepoint" saves the table as of the commit time
   so that it lets you restore the table to this savepoint at a later point in
   time if need be. but in this case, the user usually uses this to prevent
   cleaning snapshot view at a specific timestamp, only clean unused files

The situation is there some inconvenience for users if use them directly

   - Usually users incline to use a meaningful name instead of querying
   Hudi table with a timestamp, using the timestamp in SQL may lead to the
   wrong snapshot view being used. for example, we can announce that a new tag
   of hudi table with table_nameYYYYMMDD was released, then the user can use
   this new table name to query.
   - Savepoint is not designed for this "snapshot view" scenario in the
   beginning, it is designed for disaster recovery. let's say a new snapshot
   view will be created every day, and it has 7 days retention, we should
   support lifecycle management on top of it.

What I plan to do is to let Hudi support release a snapshot view and
lifecycle management out-of-box. We have already done some work when
supporting customers' snapshot view requirements in my company, and hope to
land this feature in Community too.

Please feel free to let me know if you have any idea about this.

Thanks,

Jian Feng

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by 冯健 <fe...@gmail.com>.
Hi Sagar,
HMS shouldn't be the core part, the external table location will depend on
which metastore the user is using.
 I'm still working on it, will add more detail in this RFC pr.
https://github.com/apache/hudi/pull/6576


On Fri, 16 Sept 2022 at 11:28, sagar sumit <co...@apache.org> wrote:

> Automatic lifecycle management based on a few configurations
> would be very useful for the community.
>
> I read the description in
> https://issues.apache.org/jira/browse/HUDI-4677
> May I ask the rationale for choosing
> Hive Metastore to manage the snapshots?
>
> Perhaps, RFC would have more details. Looking forward to it!
>
> Regards,
> Sagar
>
>
> On Wed, Sep 14, 2022 at 8:13 AM 冯健 <fe...@gmail.com> wrote:
>
> > Hi Ethan,
> >
> >     Yes, based on the current situation, we still need to do much extra
> > work to provide snapshot view feature for the users( or users do this by
> > themself)
> >     . I plan to merge the COW part of this feature to 0.13.0 at least.
> will
> > consider your suggestion if time is tight
> > Thanks
> >
> >
> >
> > On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo <yi...@apache.org> wrote:
> >
> > > Hi Feng Jian,
> > >
> > > Looking forward to the RFC!  Is the snapshot view management more like
> > > managing commits / savepoints in the Hudi timeline and hiding Hudi
> > > internals from the users?
> > >
> > > Do you plan to merge the implementation of snapshot view and lifecycle
> > > management for the next major release (0.13.0)?  Timeline-wise, if time
> > is
> > > tight, you may also consider scoping out a subset of features to target
> > > 0.13.0.
> > >
> > > Best,
> > > - Ethan
> > >
> > > On Mon, Sep 12, 2022 at 10:43 PM Sivabalan <n....@gmail.com> wrote:
> > >
> > > > Sounds like a nice feature to have. Eagerly looking forward for the
> > RFC.
> > > >
> > > > On Sat, 27 Aug 2022 at 20:51, 冯健 <fe...@gmail.com> wrote:
> > > >
> > > > > I attached the image in this Jira Epic
> > > > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is
> WIP,
> > > > will
> > > > > create a pr in the next few days
> > > > > Yeah, the basic idea is to implement lifecycle management based on
> > the
> > > > > savepoint and time travel features, providing new ways for the user
> > to
> > > > > operate
> > > > > and coordinate. won't propose any new concept
> > > > >
> > > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <
> xu.shiyan.raymond@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The dev email list does not support showing images unfortunately.
> > you
> > > > may
> > > > > > want to put it behind a link.
> > > > > >
> > > > > > As for the idea itself,
> > > > > >
> > > > > > What I plan to do is to let Hudi support release a snapshot view
> > and
> > > > > > > lifecycle management out-of-box.
> > > > > >
> > > > > >
> > > > > >  Are you planning to extend the savepoint feature to have
> lifecycle
> > > > mgmt
> > > > > > capabilities? We should consolidate overlapping features
> properly.
> > > > > >
> > > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi team,
> > > > > > > [image: image.png]
> > > > > > >     for the snapshot view scenario, Hudi already provides two
> key
> > > > > > > features to support it:
> > > > > > >
> > > > > > >    - Time travel: user provides a timestamp to query a specific
> > > > > snapshot
> > > > > > >    view of a Hudi table
> > > > > > >    - Savepoint/restore: "savepoint" saves the table as of the
> > > commit
> > > > > time
> > > > > > >    so that it lets you restore the table to this savepoint at a
> > > later
> > > > > > point in
> > > > > > >    time if need be. but in this case, the user usually uses
> this
> > to
> > > > > > prevent
> > > > > > >    cleaning snapshot view at a specific timestamp, only clean
> > > unused
> > > > > > files
> > > > > > >
> > > > > > > The situation is there some inconvenience for users if use them
> > > > > directly
> > > > > > >
> > > > > > >    - Usually users incline to use a meaningful name instead of
> > > > querying
> > > > > > >    Hudi table with a timestamp, using the timestamp in SQL may
> > lead
> > > > to
> > > > > > the
> > > > > > >    wrong snapshot view being used. for example, we can announce
> > > that
> > > > a
> > > > > > new tag
> > > > > > >    of hudi table with table_nameYYYYMMDD was released, then the
> > > user
> > > > > can
> > > > > > use
> > > > > > >    this new table name to query.
> > > > > > >    - Savepoint is not designed for this "snapshot view"
> scenario
> > in
> > > > the
> > > > > > >    beginning, it is designed for disaster recovery. let's say a
> > new
> > > > > > snapshot
> > > > > > >    view will be created every day, and it has 7 days retention,
> > we
> > > > > should
> > > > > > >    support lifecycle management on top of it.
> > > > > > >
> > > > > > > What I plan to do is to let Hudi support release a snapshot
> view
> > > and
> > > > > > > lifecycle management out-of-box. We have already done some work
> > > when
> > > > > > > supporting customers' snapshot view requirements in my company,
> > and
> > > > > hope
> > > > > > to
> > > > > > > land this feature in Community too.
> > > > > > >
> > > > > > > Please feel free to let me know if you have any idea about
> this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jian Feng
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Shiyan
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> >
>

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by sagar sumit <co...@apache.org>.
Automatic lifecycle management based on a few configurations
would be very useful for the community.

I read the description in
https://issues.apache.org/jira/browse/HUDI-4677
May I ask the rationale for choosing
Hive Metastore to manage the snapshots?

Perhaps, RFC would have more details. Looking forward to it!

Regards,
Sagar


On Wed, Sep 14, 2022 at 8:13 AM 冯健 <fe...@gmail.com> wrote:

> Hi Ethan,
>
>     Yes, based on the current situation, we still need to do much extra
> work to provide snapshot view feature for the users( or users do this by
> themself)
>     . I plan to merge the COW part of this feature to 0.13.0 at least. will
> consider your suggestion if time is tight
> Thanks
>
>
>
> On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo <yi...@apache.org> wrote:
>
> > Hi Feng Jian,
> >
> > Looking forward to the RFC!  Is the snapshot view management more like
> > managing commits / savepoints in the Hudi timeline and hiding Hudi
> > internals from the users?
> >
> > Do you plan to merge the implementation of snapshot view and lifecycle
> > management for the next major release (0.13.0)?  Timeline-wise, if time
> is
> > tight, you may also consider scoping out a subset of features to target
> > 0.13.0.
> >
> > Best,
> > - Ethan
> >
> > On Mon, Sep 12, 2022 at 10:43 PM Sivabalan <n....@gmail.com> wrote:
> >
> > > Sounds like a nice feature to have. Eagerly looking forward for the
> RFC.
> > >
> > > On Sat, 27 Aug 2022 at 20:51, 冯健 <fe...@gmail.com> wrote:
> > >
> > > > I attached the image in this Jira Epic
> > > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP,
> > > will
> > > > create a pr in the next few days
> > > > Yeah, the basic idea is to implement lifecycle management based on
> the
> > > > savepoint and time travel features, providing new ways for the user
> to
> > > > operate
> > > > and coordinate. won't propose any new concept
> > > >
> > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <xu.shiyan.raymond@gmail.com
> >
> > > > wrote:
> > > >
> > > > > The dev email list does not support showing images unfortunately.
> you
> > > may
> > > > > want to put it behind a link.
> > > > >
> > > > > As for the idea itself,
> > > > >
> > > > > What I plan to do is to let Hudi support release a snapshot view
> and
> > > > > > lifecycle management out-of-box.
> > > > >
> > > > >
> > > > >  Are you planning to extend the savepoint feature to have lifecycle
> > > mgmt
> > > > > capabilities? We should consolidate overlapping features properly.
> > > > >
> > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:
> > > > >
> > > > > > Hi team,
> > > > > > [image: image.png]
> > > > > >     for the snapshot view scenario, Hudi already provides two key
> > > > > > features to support it:
> > > > > >
> > > > > >    - Time travel: user provides a timestamp to query a specific
> > > > snapshot
> > > > > >    view of a Hudi table
> > > > > >    - Savepoint/restore: "savepoint" saves the table as of the
> > commit
> > > > time
> > > > > >    so that it lets you restore the table to this savepoint at a
> > later
> > > > > point in
> > > > > >    time if need be. but in this case, the user usually uses this
> to
> > > > > prevent
> > > > > >    cleaning snapshot view at a specific timestamp, only clean
> > unused
> > > > > files
> > > > > >
> > > > > > The situation is there some inconvenience for users if use them
> > > > directly
> > > > > >
> > > > > >    - Usually users incline to use a meaningful name instead of
> > > querying
> > > > > >    Hudi table with a timestamp, using the timestamp in SQL may
> lead
> > > to
> > > > > the
> > > > > >    wrong snapshot view being used. for example, we can announce
> > that
> > > a
> > > > > new tag
> > > > > >    of hudi table with table_nameYYYYMMDD was released, then the
> > user
> > > > can
> > > > > use
> > > > > >    this new table name to query.
> > > > > >    - Savepoint is not designed for this "snapshot view" scenario
> in
> > > the
> > > > > >    beginning, it is designed for disaster recovery. let's say a
> new
> > > > > snapshot
> > > > > >    view will be created every day, and it has 7 days retention,
> we
> > > > should
> > > > > >    support lifecycle management on top of it.
> > > > > >
> > > > > > What I plan to do is to let Hudi support release a snapshot view
> > and
> > > > > > lifecycle management out-of-box. We have already done some work
> > when
> > > > > > supporting customers' snapshot view requirements in my company,
> and
> > > > hope
> > > > > to
> > > > > > land this feature in Community too.
> > > > > >
> > > > > > Please feel free to let me know if you have any idea about this.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jian Feng
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Shiyan
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by 冯健 <fe...@gmail.com>.
Hi Ethan,

    Yes, based on the current situation, we still need to do much extra
work to provide snapshot view feature for the users( or users do this by
themself)
    . I plan to merge the COW part of this feature to 0.13.0 at least. will
consider your suggestion if time is tight
Thanks



On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo <yi...@apache.org> wrote:

> Hi Feng Jian,
>
> Looking forward to the RFC!  Is the snapshot view management more like
> managing commits / savepoints in the Hudi timeline and hiding Hudi
> internals from the users?
>
> Do you plan to merge the implementation of snapshot view and lifecycle
> management for the next major release (0.13.0)?  Timeline-wise, if time is
> tight, you may also consider scoping out a subset of features to target
> 0.13.0.
>
> Best,
> - Ethan
>
> On Mon, Sep 12, 2022 at 10:43 PM Sivabalan <n....@gmail.com> wrote:
>
> > Sounds like a nice feature to have. Eagerly looking forward for the RFC.
> >
> > On Sat, 27 Aug 2022 at 20:51, 冯健 <fe...@gmail.com> wrote:
> >
> > > I attached the image in this Jira Epic
> > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP,
> > will
> > > create a pr in the next few days
> > > Yeah, the basic idea is to implement lifecycle management based on the
> > > savepoint and time travel features, providing new ways for the user to
> > > operate
> > > and coordinate. won't propose any new concept
> > >
> > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <xu...@gmail.com>
> > > wrote:
> > >
> > > > The dev email list does not support showing images unfortunately. you
> > may
> > > > want to put it behind a link.
> > > >
> > > > As for the idea itself,
> > > >
> > > > What I plan to do is to let Hudi support release a snapshot view and
> > > > > lifecycle management out-of-box.
> > > >
> > > >
> > > >  Are you planning to extend the savepoint feature to have lifecycle
> > mgmt
> > > > capabilities? We should consolidate overlapping features properly.
> > > >
> > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:
> > > >
> > > > > Hi team,
> > > > > [image: image.png]
> > > > >     for the snapshot view scenario, Hudi already provides two key
> > > > > features to support it:
> > > > >
> > > > >    - Time travel: user provides a timestamp to query a specific
> > > snapshot
> > > > >    view of a Hudi table
> > > > >    - Savepoint/restore: "savepoint" saves the table as of the
> commit
> > > time
> > > > >    so that it lets you restore the table to this savepoint at a
> later
> > > > point in
> > > > >    time if need be. but in this case, the user usually uses this to
> > > > prevent
> > > > >    cleaning snapshot view at a specific timestamp, only clean
> unused
> > > > files
> > > > >
> > > > > The situation is there some inconvenience for users if use them
> > > directly
> > > > >
> > > > >    - Usually users incline to use a meaningful name instead of
> > querying
> > > > >    Hudi table with a timestamp, using the timestamp in SQL may lead
> > to
> > > > the
> > > > >    wrong snapshot view being used. for example, we can announce
> that
> > a
> > > > new tag
> > > > >    of hudi table with table_nameYYYYMMDD was released, then the
> user
> > > can
> > > > use
> > > > >    this new table name to query.
> > > > >    - Savepoint is not designed for this "snapshot view" scenario in
> > the
> > > > >    beginning, it is designed for disaster recovery. let's say a new
> > > > snapshot
> > > > >    view will be created every day, and it has 7 days retention, we
> > > should
> > > > >    support lifecycle management on top of it.
> > > > >
> > > > > What I plan to do is to let Hudi support release a snapshot view
> and
> > > > > lifecycle management out-of-box. We have already done some work
> when
> > > > > supporting customers' snapshot view requirements in my company, and
> > > hope
> > > > to
> > > > > land this feature in Community too.
> > > > >
> > > > > Please feel free to let me know if you have any idea about this.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jian Feng
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Shiyan
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by Y Ethan Guo <yi...@apache.org>.
Hi Feng Jian,

Looking forward to the RFC!  Is the snapshot view management more like
managing commits / savepoints in the Hudi timeline and hiding Hudi
internals from the users?

Do you plan to merge the implementation of snapshot view and lifecycle
management for the next major release (0.13.0)?  Timeline-wise, if time is
tight, you may also consider scoping out a subset of features to target
0.13.0.

Best,
- Ethan

On Mon, Sep 12, 2022 at 10:43 PM Sivabalan <n....@gmail.com> wrote:

> Sounds like a nice feature to have. Eagerly looking forward for the RFC.
>
> On Sat, 27 Aug 2022 at 20:51, 冯健 <fe...@gmail.com> wrote:
>
> > I attached the image in this Jira Epic
> > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP,
> will
> > create a pr in the next few days
> > Yeah, the basic idea is to implement lifecycle management based on the
> > savepoint and time travel features, providing new ways for the user to
> > operate
> > and coordinate. won't propose any new concept
> >
> > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <xu...@gmail.com>
> > wrote:
> >
> > > The dev email list does not support showing images unfortunately. you
> may
> > > want to put it behind a link.
> > >
> > > As for the idea itself,
> > >
> > > What I plan to do is to let Hudi support release a snapshot view and
> > > > lifecycle management out-of-box.
> > >
> > >
> > >  Are you planning to extend the savepoint feature to have lifecycle
> mgmt
> > > capabilities? We should consolidate overlapping features properly.
> > >
> > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:
> > >
> > > > Hi team,
> > > > [image: image.png]
> > > >     for the snapshot view scenario, Hudi already provides two key
> > > > features to support it:
> > > >
> > > >    - Time travel: user provides a timestamp to query a specific
> > snapshot
> > > >    view of a Hudi table
> > > >    - Savepoint/restore: "savepoint" saves the table as of the commit
> > time
> > > >    so that it lets you restore the table to this savepoint at a later
> > > point in
> > > >    time if need be. but in this case, the user usually uses this to
> > > prevent
> > > >    cleaning snapshot view at a specific timestamp, only clean unused
> > > files
> > > >
> > > > The situation is there some inconvenience for users if use them
> > directly
> > > >
> > > >    - Usually users incline to use a meaningful name instead of
> querying
> > > >    Hudi table with a timestamp, using the timestamp in SQL may lead
> to
> > > the
> > > >    wrong snapshot view being used. for example, we can announce that
> a
> > > new tag
> > > >    of hudi table with table_nameYYYYMMDD was released, then the user
> > can
> > > use
> > > >    this new table name to query.
> > > >    - Savepoint is not designed for this "snapshot view" scenario in
> the
> > > >    beginning, it is designed for disaster recovery. let's say a new
> > > snapshot
> > > >    view will be created every day, and it has 7 days retention, we
> > should
> > > >    support lifecycle management on top of it.
> > > >
> > > > What I plan to do is to let Hudi support release a snapshot view and
> > > > lifecycle management out-of-box. We have already done some work when
> > > > supporting customers' snapshot view requirements in my company, and
> > hope
> > > to
> > > > land this feature in Community too.
> > > >
> > > > Please feel free to let me know if you have any idea about this.
> > > >
> > > > Thanks,
> > > >
> > > > Jian Feng
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Shiyan
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by Sivabalan <n....@gmail.com>.
Sounds like a nice feature to have. Eagerly looking forward for the RFC.

On Sat, 27 Aug 2022 at 20:51, 冯健 <fe...@gmail.com> wrote:

> I attached the image in this Jira Epic
> https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, will
> create a pr in the next few days
> Yeah, the basic idea is to implement lifecycle management based on the
> savepoint and time travel features, providing new ways for the user to
> operate
> and coordinate. won't propose any new concept
>
> On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <xu...@gmail.com>
> wrote:
>
> > The dev email list does not support showing images unfortunately. you may
> > want to put it behind a link.
> >
> > As for the idea itself,
> >
> > What I plan to do is to let Hudi support release a snapshot view and
> > > lifecycle management out-of-box.
> >
> >
> >  Are you planning to extend the savepoint feature to have lifecycle mgmt
> > capabilities? We should consolidate overlapping features properly.
> >
> > On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:
> >
> > > Hi team,
> > > [image: image.png]
> > >     for the snapshot view scenario, Hudi already provides two key
> > > features to support it:
> > >
> > >    - Time travel: user provides a timestamp to query a specific
> snapshot
> > >    view of a Hudi table
> > >    - Savepoint/restore: "savepoint" saves the table as of the commit
> time
> > >    so that it lets you restore the table to this savepoint at a later
> > point in
> > >    time if need be. but in this case, the user usually uses this to
> > prevent
> > >    cleaning snapshot view at a specific timestamp, only clean unused
> > files
> > >
> > > The situation is there some inconvenience for users if use them
> directly
> > >
> > >    - Usually users incline to use a meaningful name instead of querying
> > >    Hudi table with a timestamp, using the timestamp in SQL may lead to
> > the
> > >    wrong snapshot view being used. for example, we can announce that a
> > new tag
> > >    of hudi table with table_nameYYYYMMDD was released, then the user
> can
> > use
> > >    this new table name to query.
> > >    - Savepoint is not designed for this "snapshot view" scenario in the
> > >    beginning, it is designed for disaster recovery. let's say a new
> > snapshot
> > >    view will be created every day, and it has 7 days retention, we
> should
> > >    support lifecycle management on top of it.
> > >
> > > What I plan to do is to let Hudi support release a snapshot view and
> > > lifecycle management out-of-box. We have already done some work when
> > > supporting customers' snapshot view requirements in my company, and
> hope
> > to
> > > land this feature in Community too.
> > >
> > > Please feel free to let me know if you have any idea about this.
> > >
> > > Thanks,
> > >
> > > Jian Feng
> > >
> >
> >
> > --
> > Best,
> > Shiyan
> >
>


-- 
Regards,
-Sivabalan

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by 冯健 <fe...@gmail.com>.
I attached the image in this Jira Epic
https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, will
create a pr in the next few days
Yeah, the basic idea is to implement lifecycle management based on the
savepoint and time travel features, providing new ways for the user to operate
and coordinate. won't propose any new concept

On Sun, 28 Aug 2022 at 02:06, Shiyan Xu <xu...@gmail.com> wrote:

> The dev email list does not support showing images unfortunately. you may
> want to put it behind a link.
>
> As for the idea itself,
>
> What I plan to do is to let Hudi support release a snapshot view and
> > lifecycle management out-of-box.
>
>
>  Are you planning to extend the savepoint feature to have lifecycle mgmt
> capabilities? We should consolidate overlapping features properly.
>
> On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:
>
> > Hi team,
> > [image: image.png]
> >     for the snapshot view scenario, Hudi already provides two key
> > features to support it:
> >
> >    - Time travel: user provides a timestamp to query a specific snapshot
> >    view of a Hudi table
> >    - Savepoint/restore: "savepoint" saves the table as of the commit time
> >    so that it lets you restore the table to this savepoint at a later
> point in
> >    time if need be. but in this case, the user usually uses this to
> prevent
> >    cleaning snapshot view at a specific timestamp, only clean unused
> files
> >
> > The situation is there some inconvenience for users if use them directly
> >
> >    - Usually users incline to use a meaningful name instead of querying
> >    Hudi table with a timestamp, using the timestamp in SQL may lead to
> the
> >    wrong snapshot view being used. for example, we can announce that a
> new tag
> >    of hudi table with table_nameYYYYMMDD was released, then the user can
> use
> >    this new table name to query.
> >    - Savepoint is not designed for this "snapshot view" scenario in the
> >    beginning, it is designed for disaster recovery. let's say a new
> snapshot
> >    view will be created every day, and it has 7 days retention, we should
> >    support lifecycle management on top of it.
> >
> > What I plan to do is to let Hudi support release a snapshot view and
> > lifecycle management out-of-box. We have already done some work when
> > supporting customers' snapshot view requirements in my company, and hope
> to
> > land this feature in Community too.
> >
> > Please feel free to let me know if you have any idea about this.
> >
> > Thanks,
> >
> > Jian Feng
> >
>
>
> --
> Best,
> Shiyan
>

Re: [DISCUSS] New RFC to support 'Snapshot view management'

Posted by Shiyan Xu <xu...@gmail.com>.
The dev email list does not support showing images unfortunately. you may
want to put it behind a link.

As for the idea itself,

What I plan to do is to let Hudi support release a snapshot view and
> lifecycle management out-of-box.


 Are you planning to extend the savepoint feature to have lifecycle mgmt
capabilities? We should consolidate overlapping features properly.

On Sun, Aug 21, 2022 at 12:59 PM 冯健 <fe...@gmail.com> wrote:

> Hi team,
> [image: image.png]
>     for the snapshot view scenario, Hudi already provides two key
> features to support it:
>
>    - Time travel: user provides a timestamp to query a specific snapshot
>    view of a Hudi table
>    - Savepoint/restore: "savepoint" saves the table as of the commit time
>    so that it lets you restore the table to this savepoint at a later point in
>    time if need be. but in this case, the user usually uses this to prevent
>    cleaning snapshot view at a specific timestamp, only clean unused files
>
> The situation is there some inconvenience for users if use them directly
>
>    - Usually users incline to use a meaningful name instead of querying
>    Hudi table with a timestamp, using the timestamp in SQL may lead to the
>    wrong snapshot view being used. for example, we can announce that a new tag
>    of hudi table with table_nameYYYYMMDD was released, then the user can use
>    this new table name to query.
>    - Savepoint is not designed for this "snapshot view" scenario in the
>    beginning, it is designed for disaster recovery. let's say a new snapshot
>    view will be created every day, and it has 7 days retention, we should
>    support lifecycle management on top of it.
>
> What I plan to do is to let Hudi support release a snapshot view and
> lifecycle management out-of-box. We have already done some work when
> supporting customers' snapshot view requirements in my company, and hope to
> land this feature in Community too.
>
> Please feel free to let me know if you have any idea about this.
>
> Thanks,
>
> Jian Feng
>


-- 
Best,
Shiyan