You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Mohit Anchlia <mo...@gmail.com> on 2017/02/03 00:45:52 UTC

Clarification on state backend parameters

Trying to understand these 3 parameters:

state.backend
state.backend.fs.checkpointdir
state.backend.rocksdb.checkpointdir
state.checkpoints.dir

As I understand stream of data and the state of operators are 2 different
concepts and that both need to be checkpointed. I am bit confused about the
purpose of these parameters and their applicability.

Re: Clarification on state backend parameters

Posted by Mohit Anchlia <mo...@gmail.com>.
Thanks for the clarification!

On Sat, Feb 4, 2017 at 3:34 AM, Stefan Richter <s....@data-artisans.com>
wrote:

> If you have configured RocksDB as backend, Flink typically has multiple
> RocksDB instances per job - one for each parallel operator instance with
> keyed state. Those RocksDB instances live local to their corresponding
> operator instances. Parameter state.backend.rocksdb.checkpointdir
> configures the working directory of those instances. Working directories
> are used to store files during the operation of RocksDB, therefore it
> should mainly allow for fast access, e.g. be resident on a local disk
> filesystem. In contrast to that, state.backend.fs.checkpointdir specifies
> where checkpoint data is stored. Think of this as a backup directory, where
> the most important properties are availability and fault tolerance. This
> would typically be located on a distributed file system like HDFS that is
> also accessible from each node, so that operators can be recovered on
> different machines in case of machine failures.
>
> Am 03.02.2017 um 20:55 schrieb Mohit Anchlia <mo...@gmail.com>:
>
> I thought rocksdb is used to as a store backend. If that is the case then
> why would are there 2 configuration parameter? Or in other words what is
> the behavior if both state.backend.fs.checkpointdir and
> state.backend.rocksdb is set?
>
> On Fri, Feb 3, 2017 at 1:47 AM, Stefan Richter <
> s.richter@data-artisans.com> wrote:
>
>> Hi,
>>
>> the purpose of the configuration parameter is described in the
>> documentation under https://ci.apache.org/pr
>> ojects/flink/flink-docs-release-1.2/setup/config.html. In a nutshell,
>> state.checkpoints.dir contains the (small) meta data files for checkpoints,
>> which typically contains pointers to the files which contain the actual
>> state snapshot data. The state.backend.fs.checkpointdir is the directory
>> into which the actual state from the backends is written. Finally,
>> state.backend.rocksdb.checkpointdir is a poorly named key for the
>> directory of the RocksDB instance data and has in fact nothing to do with
>> checkpoints.
>>
>> Best,
>> Stefan
>>
>> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mo...@gmail.com>:
>>
>> Trying to understand these 3 parameters:
>>
>> state.backend
>> state.backend.fs.checkpointdir
>> state.backend.rocksdb.checkpointdir
>> state.checkpoints.dir
>>
>> As I understand stream of data and the state of operators are 2 different
>> concepts and that both need to be checkpointed. I am bit confused about the
>> purpose of these parameters and their applicability.
>>
>>
>>
>
>

Re: Clarification on state backend parameters

Posted by Bowen Li <bo...@offerupnow.com>.
FYI,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Clarification-on-state-backend-parameters-td11419.html
here's the context that discussed differences among:

state.backend.fs.checkpointdir
state.backend.rocksdb.checkpointdir
state.checkpoints.dir

On Wed, Jun 14, 2017 at 12:20 PM, bowen.li <bo...@offerupnow.com> wrote:

> Hi guys,
>     This is great clarification!
>
>     An extended question from me is, what's the difference between
> `state.checkpoints.dir` and the param you pass in to RocksDBStateBackend
> constructor in`public RocksDBStateBackend(URI checkpointDataUri) throws
> IOException`? They are really confusing.
>
>     I specified checkpointDataUri but got error of `CheckpointConfig says
> to
> persist periodic checkpoints, but no checkpoint directory has been
> configured. You can configure configure one via key
> 'state.checkpoints.dir'.`.
>
> Thanks,
> Bowen
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Clarification-
> on-state-backend-parameters-tp11419p13744.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: Clarification on state backend parameters

Posted by "bowen.li" <bo...@offerupnow.com>.
Hi guys, 
    This is great clarification! 

    An extended question from me is, what's the difference between
`state.checkpoints.dir` and the param you pass in to RocksDBStateBackend
constructor in`public RocksDBStateBackend(URI checkpointDataUri) throws
IOException`? They are really confusing. 

    I specified checkpointDataUri but got error of `CheckpointConfig says to
persist periodic checkpoints, but no checkpoint directory has been
configured. You can configure configure one via key
'state.checkpoints.dir'.`. 

Thanks, 
Bowen



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Clarification-on-state-backend-parameters-tp11419p13744.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Clarification on state backend parameters

Posted by Stefan Richter <s....@data-artisans.com>.
If you have configured RocksDB as backend, Flink typically has multiple RocksDB instances per job - one for each parallel operator instance with keyed state. Those RocksDB instances live local to their corresponding operator instances. Parameter state.backend.rocksdb.checkpointdir configures the working directory of those instances. Working directories are used to store files during the operation of RocksDB, therefore it should mainly allow for fast access, e.g. be resident on a local disk filesystem. In contrast to that, state.backend.fs.checkpointdir specifies where checkpoint data is stored. Think of this as a backup directory, where the most important properties are availability and fault tolerance. This would typically be located on a distributed file system like HDFS that is also accessible from each node, so that operators can be recovered on different machines in case of machine failures.

> Am 03.02.2017 um 20:55 schrieb Mohit Anchlia <mo...@gmail.com>:
> 
> I thought rocksdb is used to as a store backend. If that is the case then why would are there 2 configuration parameter? Or in other words what is the behavior if both state.backend.fs.checkpointdir and state.backend.rocksdb is set?
> 
> On Fri, Feb 3, 2017 at 1:47 AM, Stefan Richter <s.richter@data-artisans.com <ma...@data-artisans.com>> wrote:
> Hi,
> 
> the purpose of the configuration parameter is described in the documentation under https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html <https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html>. In a nutshell, state.checkpoints.dir contains the (small) meta data files for checkpoints, which typically contains pointers to the files which contain the actual state snapshot data. The state.backend.fs.checkpointdir is the directory into which the actual state from the backends is written. Finally, state.backend.rocksdb.checkpointdir is a poorly named key for the directory of the RocksDB instance data and has in fact nothing to do with checkpoints.
> 
> Best,
> Stefan
> 
>> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mohitanchlia@gmail.com <ma...@gmail.com>>:
>> 
>> Trying to understand these 3 parameters:
>> 
>> state.backend
>> state.backend.fs.checkpointdir
>> state.backend.rocksdb.checkpointdir
>> state.checkpoints.dir
>> 
>> As I understand stream of data and the state of operators are 2 different concepts and that both need to be checkpointed. I am bit confused about the purpose of these parameters and their applicability.
> 
> 


Re: Clarification on state backend parameters

Posted by Mohit Anchlia <mo...@gmail.com>.
I thought rocksdb is used to as a store backend. If that is the case then
why would are there 2 configuration parameter? Or in other words what is
the behavior if both state.backend.fs.checkpointdir and
state.backend.rocksdb is set?

On Fri, Feb 3, 2017 at 1:47 AM, Stefan Richter <s....@data-artisans.com>
wrote:

> Hi,
>
> the purpose of the configuration parameter is described in the
> documentation under https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/setup/config.html. In a nutshell, state.checkpoints.dir
> contains the (small) meta data files for checkpoints, which typically
> contains pointers to the files which contain the actual state snapshot
> data. The state.backend.fs.checkpointdir is the directory into which the
> actual state from the backends is written. Finally, state.backend.rocksdb.checkpointdir
> is a poorly named key for the directory of the RocksDB instance data and
> has in fact nothing to do with checkpoints.
>
> Best,
> Stefan
>
> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mo...@gmail.com>:
>
> Trying to understand these 3 parameters:
>
> state.backend
> state.backend.fs.checkpointdir
> state.backend.rocksdb.checkpointdir
> state.checkpoints.dir
>
> As I understand stream of data and the state of operators are 2 different
> concepts and that both need to be checkpointed. I am bit confused about the
> purpose of these parameters and their applicability.
>
>
>

Re: Clarification on state backend parameters

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

the purpose of the configuration parameter is described in the documentation under https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html <https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html>. In a nutshell, state.checkpoints.dir contains the (small) meta data files for checkpoints, which typically contains pointers to the files which contain the actual state snapshot data. The state.backend.fs.checkpointdir is the directory into which the actual state from the backends is written. Finally, state.backend.rocksdb.checkpointdir is a poorly named key for the directory of the RocksDB instance data and has in fact nothing to do with checkpoints.

Best,
Stefan

> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mo...@gmail.com>:
> 
> Trying to understand these 3 parameters:
> 
> state.backend
> state.backend.fs.checkpointdir
> state.backend.rocksdb.checkpointdir
> state.checkpoints.dir
> 
> As I understand stream of data and the state of operators are 2 different concepts and that both need to be checkpointed. I am bit confused about the purpose of these parameters and their applicability.