You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by chandan prakash <ch...@gmail.com> on 2018/10/24 12:47:07 UTC

Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Hi,
I am new to Flink.
Was looking into the code to understand how Flink does FullSnapshot and
Incremental Snapshot using RocksDB

What I understood:
1. *For full snapshot, we call RocksDb snapshot api* which basically an
iterator handle to the entries in RocksDB instance. We iterate over every
entry one by one and serialize that to some distributed file system.
Similarly in restore for fullSnapshot, we read the file to get every entry
and apply that to the rocksDb instance one by one to fully construct the db
instance.

2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
Similarly on restore, we copy the sst files to local directory and
instantiate rocksDB instance with the path of the directory.

*My Question is:*
1. Why did we took 2 different approaches using different RocksDB apis ?
We could have used Checkpoint api of RocksDB for fullSnapshot as well .
2. Is there any specific reason to use *Snapshot API of rocksDB*  over
*Checkpoint
api of RocksDB* for *fullSnapshot*?

I am sure, I am missing some important point, really curious to know that.
Any explanation will be really great. Thanks in advance.


Regards,
Chandan





-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by chandan prakash <ch...@gmail.com>.

Thanks a lot Andrey .
Your answer to 1st question clarifies well for both the questions.
Appreciate your help in clearing doubt


Regards,
Chandan

On Thu, Oct 25, 2018 at 6:10 PM Andrey Zagrebin <an...@data-artisans.com>
wrote:

> Hi Chandan,
>
> > 1. Why did we took 2 different approaches using different RocksDB apis ?
> > We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>
> The reason here is partially historical. Full snapshot in RocksDB backend
> was implemented before incremental and rescaling for incremental snapshot
> but after heap backend. Full snapshot in RocksDB uses approach close to
> heap backend because Flink community plans to support the unified format
> for savepoints. The unified format would make it possible to switch
> backends and restore from savepoint. The formats still differ due to
> backend specifics to optimise snapshotting and restore but it is
> technically possible to unify them in future.
>
> > 2. Is there any specific reason to use Snapshot API of rocksDB
> over Checkpoint api of RocksDB for fullSnapshot?
>
> I think Checkpoint API produces separate SST file list to copy them to
> HDFS in case of incremental snapshot.
>
> Full snapshot does not need the file list, it just needs an iterator over
> snapshotted (frozen) data. Internally RocksDB just hard-links immutable
> already existing SST files and iterates their data for Snapshot API.
>
> Best,
> Andrey
>
>
> On 24 Oct 2018, at 18:40, chandan prakash <ch...@gmail.com>
> wrote:
>
> Thanks Tzu-Li for redirecting.
> Would also like to be corrected if my any inference from the code is
> incorrect or incomplete.
> I am sure it will help to clear doubts of more developers like me  :)
> Thanks in advance.
>
> Regards,
> Chandan
>
>
> On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> Hi,
>>
>> I’m forwarding this question to Stefan (cc’ed).
>> He would most likely be able to answer your question, as he has done
>> substantial work in the RocksDB state backends.
>>
>> Cheers,
>> Gordon
>>
>>
>> On 24 October 2018 at 8:47:24 PM, chandan prakash (
>> chandanbaranwal@gmail.com) wrote:
>>
>> Hi,
>> I am new to Flink.
>> Was looking into the code to understand how Flink does FullSnapshot and
>> Incremental Snapshot using RocksDB
>>
>> What I understood:
>> 1. *For full snapshot, we call RocksDb snapshot api* which basically an
>> iterator handle to the entries in RocksDB instance. We iterate over every
>> entry one by one and serialize that to some distributed file system.
>> Similarly in restore for fullSnapshot, we read the file to get every
>> entry and apply that to the rocksDb instance one by one to fully construct
>> the db instance.
>>
>> 2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
>> Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
>> Similarly on restore, we copy the sst files to local directory and
>> instantiate rocksDB instance with the path of the directory.
>>
>> *My Question is:*
>> 1. Why did we took 2 different approaches using different RocksDB apis ?
>> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>> 2. Is there any specific reason to use *Snapshot API of rocksDB*  over *Checkpoint
>> api of RocksDB* for *fullSnapshot*?
>>
>> I am sure, I am missing some important point, really curious to know that.
>> Any explanation will be really great. Thanks in advance.
>>
>>
>> Regards,
>> Chandan
>>
>>
>>
>>
>>
>> --
>> Chandan Prakash
>>
>>
>
> --
> Chandan Prakash
>
>
>

-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by chandan prakash <ch...@gmail.com>.

Thanks a lot Andrey .
Your answer to 1st question clarifies well for both the questions.
Appreciate your help in clearing doubt


Regards,
Chandan

On Thu, Oct 25, 2018 at 6:10 PM Andrey Zagrebin <an...@data-artisans.com>
wrote:

> Hi Chandan,
>
> > 1. Why did we took 2 different approaches using different RocksDB apis ?
> > We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>
> The reason here is partially historical. Full snapshot in RocksDB backend
> was implemented before incremental and rescaling for incremental snapshot
> but after heap backend. Full snapshot in RocksDB uses approach close to
> heap backend because Flink community plans to support the unified format
> for savepoints. The unified format would make it possible to switch
> backends and restore from savepoint. The formats still differ due to
> backend specifics to optimise snapshotting and restore but it is
> technically possible to unify them in future.
>
> > 2. Is there any specific reason to use Snapshot API of rocksDB
> over Checkpoint api of RocksDB for fullSnapshot?
>
> I think Checkpoint API produces separate SST file list to copy them to
> HDFS in case of incremental snapshot.
>
> Full snapshot does not need the file list, it just needs an iterator over
> snapshotted (frozen) data. Internally RocksDB just hard-links immutable
> already existing SST files and iterates their data for Snapshot API.
>
> Best,
> Andrey
>
>
> On 24 Oct 2018, at 18:40, chandan prakash <ch...@gmail.com>
> wrote:
>
> Thanks Tzu-Li for redirecting.
> Would also like to be corrected if my any inference from the code is
> incorrect or incomplete.
> I am sure it will help to clear doubts of more developers like me  :)
> Thanks in advance.
>
> Regards,
> Chandan
>
>
> On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> Hi,
>>
>> I’m forwarding this question to Stefan (cc’ed).
>> He would most likely be able to answer your question, as he has done
>> substantial work in the RocksDB state backends.
>>
>> Cheers,
>> Gordon
>>
>>
>> On 24 October 2018 at 8:47:24 PM, chandan prakash (
>> chandanbaranwal@gmail.com) wrote:
>>
>> Hi,
>> I am new to Flink.
>> Was looking into the code to understand how Flink does FullSnapshot and
>> Incremental Snapshot using RocksDB
>>
>> What I understood:
>> 1. *For full snapshot, we call RocksDb snapshot api* which basically an
>> iterator handle to the entries in RocksDB instance. We iterate over every
>> entry one by one and serialize that to some distributed file system.
>> Similarly in restore for fullSnapshot, we read the file to get every
>> entry and apply that to the rocksDb instance one by one to fully construct
>> the db instance.
>>
>> 2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
>> Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
>> Similarly on restore, we copy the sst files to local directory and
>> instantiate rocksDB instance with the path of the directory.
>>
>> *My Question is:*
>> 1. Why did we took 2 different approaches using different RocksDB apis ?
>> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>> 2. Is there any specific reason to use *Snapshot API of rocksDB*  over *Checkpoint
>> api of RocksDB* for *fullSnapshot*?
>>
>> I am sure, I am missing some important point, really curious to know that.
>> Any explanation will be really great. Thanks in advance.
>>
>>
>> Regards,
>> Chandan
>>
>>
>>
>>
>>
>> --
>> Chandan Prakash
>>
>>
>
> --
> Chandan Prakash
>
>
>

-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by Andrey Zagrebin <an...@data-artisans.com>.

Hi Chandan,

> 1. Why did we took 2 different approaches using different RocksDB apis ?
> We could have used Checkpoint api of RocksDB for fullSnapshot as well .

The reason here is partially historical. Full snapshot in RocksDB backend was implemented before incremental and rescaling for incremental snapshot but after heap backend. Full snapshot in RocksDB uses approach close to heap backend because Flink community plans to support the unified format for savepoints. The unified format would make it possible to switch backends and restore from savepoint. The formats still differ due to backend specifics to optimise snapshotting and restore but it is technically possible to unify them in future.

> 2. Is there any specific reason to use Snapshot API of rocksDB  over Checkpoint api of RocksDB for fullSnapshot?

I think Checkpoint API produces separate SST file list to copy them to HDFS in case of incremental snapshot.

Full snapshot does not need the file list, it just needs an iterator over snapshotted (frozen) data. Internally RocksDB just hard-links immutable already existing SST files and iterates their data for Snapshot API.

Best,
Andrey


> On 24 Oct 2018, at 18:40, chandan prakash <ch...@gmail.com> wrote:
> 
> Thanks Tzu-Li for redirecting.
> Would also like to be corrected if my any inference from the code is incorrect or incomplete.
> I am sure it will help to clear doubts of more developers like me  :)
> Thanks in advance.
> 
> Regards,
> Chandan
> 
> 
> On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tzulitai@apache.org <ma...@apache.org>> wrote:
> Hi,
> 
> I’m forwarding this question to Stefan (cc’ed).
> He would most likely be able to answer your question, as he has done substantial work in the RocksDB state backends.
> 
> Cheers,
> Gordon
> 
> 
> On 24 October 2018 at 8:47:24 PM, chandan prakash (chandanbaranwal@gmail.com <ma...@gmail.com>) wrote:
> 
>> Hi,
>> I am new to Flink.
>> Was looking into the code to understand how Flink does FullSnapshot and Incremental Snapshot using RocksDB
>> 
>> What I understood:
>> 1. For full snapshot, we call RocksDb snapshot api which basically an iterator handle to the entries in RocksDB instance. We iterate over every entry one by one and serialize that to some distributed file system. 
>> Similarly in restore for fullSnapshot, we read the file to get every entry and apply that to the rocksDb instance one by one to fully construct the db instance.
>> 
>> 2. On the other hand in for Incremental Snapshot, we rely on RocksDB Checkpoint api to copy the sst files to HDFS/S3 incrementally.
>> Similarly on restore, we copy the sst files to local directory and instantiate rocksDB instance with the path of the directory.
>> 
>> My Question is:
>> 1. Why did we took 2 different approaches using different RocksDB apis ?
>> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>> 2. Is there any specific reason to use Snapshot API of rocksDB  over Checkpoint api of RocksDB for fullSnapshot?
>> 
>> I am sure, I am missing some important point, really curious to know that.
>> Any explanation will be really great. Thanks in advance.
>> 
>> 
>> Regards,
>> Chandan
>> 
>> 
>> 
>> 
>> 
>> --
>> Chandan Prakash
>> 
> 
> 
> -- 
> Chandan Prakash
>

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by Andrey Zagrebin <an...@data-artisans.com>.

Hi Chandan,

> 1. Why did we took 2 different approaches using different RocksDB apis ?
> We could have used Checkpoint api of RocksDB for fullSnapshot as well .

The reason here is partially historical. Full snapshot in RocksDB backend was implemented before incremental and rescaling for incremental snapshot but after heap backend. Full snapshot in RocksDB uses approach close to heap backend because Flink community plans to support the unified format for savepoints. The unified format would make it possible to switch backends and restore from savepoint. The formats still differ due to backend specifics to optimise snapshotting and restore but it is technically possible to unify them in future.

> 2. Is there any specific reason to use Snapshot API of rocksDB  over Checkpoint api of RocksDB for fullSnapshot?

I think Checkpoint API produces separate SST file list to copy them to HDFS in case of incremental snapshot.

Full snapshot does not need the file list, it just needs an iterator over snapshotted (frozen) data. Internally RocksDB just hard-links immutable already existing SST files and iterates their data for Snapshot API.

Best,
Andrey


> On 24 Oct 2018, at 18:40, chandan prakash <ch...@gmail.com> wrote:
> 
> Thanks Tzu-Li for redirecting.
> Would also like to be corrected if my any inference from the code is incorrect or incomplete.
> I am sure it will help to clear doubts of more developers like me  :)
> Thanks in advance.
> 
> Regards,
> Chandan
> 
> 
> On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tzulitai@apache.org <ma...@apache.org>> wrote:
> Hi,
> 
> I’m forwarding this question to Stefan (cc’ed).
> He would most likely be able to answer your question, as he has done substantial work in the RocksDB state backends.
> 
> Cheers,
> Gordon
> 
> 
> On 24 October 2018 at 8:47:24 PM, chandan prakash (chandanbaranwal@gmail.com <ma...@gmail.com>) wrote:
> 
>> Hi,
>> I am new to Flink.
>> Was looking into the code to understand how Flink does FullSnapshot and Incremental Snapshot using RocksDB
>> 
>> What I understood:
>> 1. For full snapshot, we call RocksDb snapshot api which basically an iterator handle to the entries in RocksDB instance. We iterate over every entry one by one and serialize that to some distributed file system. 
>> Similarly in restore for fullSnapshot, we read the file to get every entry and apply that to the rocksDb instance one by one to fully construct the db instance.
>> 
>> 2. On the other hand in for Incremental Snapshot, we rely on RocksDB Checkpoint api to copy the sst files to HDFS/S3 incrementally.
>> Similarly on restore, we copy the sst files to local directory and instantiate rocksDB instance with the path of the directory.
>> 
>> My Question is:
>> 1. Why did we took 2 different approaches using different RocksDB apis ?
>> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>> 2. Is there any specific reason to use Snapshot API of rocksDB  over Checkpoint api of RocksDB for fullSnapshot?
>> 
>> I am sure, I am missing some important point, really curious to know that.
>> Any explanation will be really great. Thanks in advance.
>> 
>> 
>> Regards,
>> Chandan
>> 
>> 
>> 
>> 
>> 
>> --
>> Chandan Prakash
>> 
> 
> 
> -- 
> Chandan Prakash
>

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by chandan prakash <ch...@gmail.com>.

Thanks Tzu-Li for redirecting.
Would also like to be corrected if my any inference from the code is
incorrect or incomplete.
I am sure it will help to clear doubts of more developers like me  :)
Thanks in advance.

Regards,
Chandan


On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi,
>
> I’m forwarding this question to Stefan (cc’ed).
> He would most likely be able to answer your question, as he has done
> substantial work in the RocksDB state backends.
>
> Cheers,
> Gordon
>
>
> On 24 October 2018 at 8:47:24 PM, chandan prakash (
> chandanbaranwal@gmail.com) wrote:
>
> Hi,
> I am new to Flink.
> Was looking into the code to understand how Flink does FullSnapshot and
> Incremental Snapshot using RocksDB
>
> What I understood:
> 1. *For full snapshot, we call RocksDb snapshot api* which basically an
> iterator handle to the entries in RocksDB instance. We iterate over every
> entry one by one and serialize that to some distributed file system.
> Similarly in restore for fullSnapshot, we read the file to get every entry
> and apply that to the rocksDb instance one by one to fully construct the db
> instance.
>
> 2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
> Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
> Similarly on restore, we copy the sst files to local directory and
> instantiate rocksDB instance with the path of the directory.
>
> *My Question is:*
> 1. Why did we took 2 different approaches using different RocksDB apis ?
> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
> 2. Is there any specific reason to use *Snapshot API of rocksDB*  over *Checkpoint
> api of RocksDB* for *fullSnapshot*?
>
> I am sure, I am missing some important point, really curious to know that.
> Any explanation will be really great. Thanks in advance.
>
>
> Regards,
> Chandan
>
>
>
>
>
> --
> Chandan Prakash
>
>

-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by chandan prakash <ch...@gmail.com>.

Thanks Tzu-Li for redirecting.
Would also like to be corrected if my any inference from the code is
incorrect or incomplete.
I am sure it will help to clear doubts of more developers like me  :)
Thanks in advance.

Regards,
Chandan


On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi,
>
> I’m forwarding this question to Stefan (cc’ed).
> He would most likely be able to answer your question, as he has done
> substantial work in the RocksDB state backends.
>
> Cheers,
> Gordon
>
>
> On 24 October 2018 at 8:47:24 PM, chandan prakash (
> chandanbaranwal@gmail.com) wrote:
>
> Hi,
> I am new to Flink.
> Was looking into the code to understand how Flink does FullSnapshot and
> Incremental Snapshot using RocksDB
>
> What I understood:
> 1. *For full snapshot, we call RocksDb snapshot api* which basically an
> iterator handle to the entries in RocksDB instance. We iterate over every
> entry one by one and serialize that to some distributed file system.
> Similarly in restore for fullSnapshot, we read the file to get every entry
> and apply that to the rocksDb instance one by one to fully construct the db
> instance.
>
> 2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
> Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
> Similarly on restore, we copy the sst files to local directory and
> instantiate rocksDB instance with the path of the directory.
>
> *My Question is:*
> 1. Why did we took 2 different approaches using different RocksDB apis ?
> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
> 2. Is there any specific reason to use *Snapshot API of rocksDB*  over *Checkpoint
> api of RocksDB* for *fullSnapshot*?
>
> I am sure, I am missing some important point, really curious to know that.
> Any explanation will be really great. Thanks in advance.
>
>
> Regards,
> Chandan
>
>
>
>
>
> --
> Chandan Prakash
>
>

-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Hi,

I’m forwarding this question to Stefan (cc’ed).
He would most likely be able to answer your question, as he has done substantial work in the RocksDB state backends.

Cheers,
Gordon

On 24 October 2018 at 8:47:24 PM, chandan prakash (chandanbaranwal@gmail.com) wrote:

Hi,
I am new to Flink.
Was looking into the code to understand how Flink does FullSnapshot and Incremental Snapshot using RocksDB

What I understood:
1. For full snapshot, we call RocksDb snapshot api which basically an iterator handle to the entries in RocksDB instance. We iterate over every entry one by one and serialize that to some distributed file system.
Similarly in restore for fullSnapshot, we read the file to get every entry and apply that to the rocksDb instance one by one to fully construct the db instance.

2. On the other hand in for Incremental Snapshot, we rely on RocksDB Checkpoint api to copy the sst files to HDFS/S3 incrementally.
Similarly on restore, we copy the sst files to local directory and instantiate rocksDB instance with the path of the directory.

My Question is:
1. Why did we took 2 different approaches using different RocksDB apis ?
We could have used Checkpoint api of RocksDB for fullSnapshot as well .
2. Is there any specific reason to use Snapshot API of rocksDB over Checkpoint api of RocksDB for fullSnapshot?

I am sure, I am missing some important point, really curious to know that.
Any explanation will be really great. Thanks in advance.

Regards,
Chandan

--
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Hi,

I’m forwarding this question to Stefan (cc’ed).
He would most likely be able to answer your question, as he has done substantial work in the RocksDB state backends.

Cheers,
Gordon

On 24 October 2018 at 8:47:24 PM, chandan prakash (chandanbaranwal@gmail.com) wrote:

Hi,
I am new to Flink.
Was looking into the code to understand how Flink does FullSnapshot and Incremental Snapshot using RocksDB

I am sure, I am missing some important point, really curious to know that.
Any explanation will be really great. Thanks in advance.

Regards,
Chandan

--
Chandan Prakash