You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Narendra Joshi <na...@gmail.com> on 2017/09/21 11:26:17 UTC

Question about concurrent checkpoints

Hi,

How are concurrent snapshots taken for an operator?
Let's say an operator receives barriers for a checkpoint from all of
its inputs. It triggers the checkpoint. Now, the checkpoint starts
getting saved asynchronously. Before the checkpoint is acknowledged,
the operator receives all barriers for all inputs for the next
checkpoint. What will happen in this case if no concurrent checkpoints
are allowed (i.e. the default value is used)? What will happen if
concurrent checkpoints are allowed?

Thanks,
Narendra Joshi

Re: Question about concurrent checkpoints

Posted by Nico Kruber <ni...@data-artisans.com>.
On Thursday, 21 September 2017 20:08:01 CEST Narendra Joshi wrote:
> Nico Kruber <ni...@data-artisans.com> writes:
> > according to [1], even with asynchronous state snapshots (see [2]), a
> > checkpoint is only complete after all sinks have received the barriers and
> > all (asynchronous) snapshots have been processed. Since, if the number of
> > concurrent checkpoints is 0, no checkpoint barriers will be emitted until
> > the previous checkpoint is complete (see [1]), you will not get into the
> > situation where two asynchronous snapshots are being taken concurrently.
> 
> Does this mean that operators would stop processing streams (because
> they received all barriers for a new checkpoint) and wait for
> the ongoing asynchronous checkpoint to complete or it means that no
> barriers would be injected into sources before checkpoint finishes?

The latter (as mentioned): no new barriers are injected into the sources.

The only thing that is waiting for asynchronous state snapshots to complete is 
the checkpoint coordinator (in any case!) since a checkpoint is only complete 
once all operators have stored their state. Operation continues as expected.


Nico

Re: Question about concurrent checkpoints

Posted by Narendra Joshi <na...@gmail.com>.
Nico Kruber <ni...@data-artisans.com> writes:

> Hi Narendra,
> according to [1], even with asynchronous state snapshots (see [2]), a 
> checkpoint is only complete after all sinks have received the barriers and all 
> (asynchronous) snapshots have been processed. Since, if the number of 
> concurrent checkpoints is 0, no checkpoint barriers will be emitted until the 
> previous checkpoint is complete (see [1]), you will not get into the situation 
> where two asynchronous snapshots are being taken concurrently.
Does this mean that operators would stop processing streams (because
they received all barriers for a new checkpoint) and wait for
the ongoing asynchronous checkpoint to complete or it means that no
barriers would be injected into sources before checkpoint finishes?

> If you enable concurrent checkpoints and asynchronous snapshots , they will 
> process concurrently but on different snapshots of the state, i.e. although 
> they are running in parallel, each stores the expected state.
> If you want to know more about the details of how this is done, I can 
> recommend Stefan's (cc'd) talk at Flink Forward last week [4]. He may also be 
> able to answer in more detail in case I missed something.
Thanks for the reference to the talk! :)

>
>
> Nico
>
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/
> stream_checkpointing.html#asynchronous-state-snapshots
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/
> state_backends.html
> [3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
> checkpointing.html
> [4] https://www.youtube.com/watch?
> v=dWQ24wERItM&index=36&list=PLDX4T_cnKjD0JeULl1X6iTn7VIkDeYX_X
>
> On Thursday, 21 September 2017 13:26:17 CEST Narendra Joshi wrote:
>> Hi,
>> 
>> How are concurrent snapshots taken for an operator?
>> Let's say an operator receives barriers for a checkpoint from all of
>> its inputs. It triggers the checkpoint. Now, the checkpoint starts
>> getting saved asynchronously. Before the checkpoint is acknowledged,
>> the operator receives all barriers for all inputs for the next
>> checkpoint. What will happen in this case if no concurrent checkpoints
>> are allowed (i.e. the default value is used)? What will happen if
>> concurrent checkpoints are allowed?
>> 
>> Thanks,
>> Narendra Joshi
>

-- 
Narendra Joshi

Re: Question about concurrent checkpoints

Posted by Nico Kruber <ni...@data-artisans.com>.
Hi Narendra,
according to [1], even with asynchronous state snapshots (see [2]), a 
checkpoint is only complete after all sinks have received the barriers and all 
(asynchronous) snapshots have been processed. Since, if the number of 
concurrent checkpoints is 0, no checkpoint barriers will be emitted until the 
previous checkpoint is complete (see [1]), you will not get into the situation 
where two asynchronous snapshots are being taken concurrently.

If you enable concurrent checkpoints and asynchronous snapshots , they will 
process concurrently but on different snapshots of the state, i.e. although 
they are running in parallel, each stores the expected state.
If you want to know more about the details of how this is done, I can 
recommend Stefan's (cc'd) talk at Flink Forward last week [4]. He may also be 
able to answer in more detail in case I missed something.



Nico


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/
stream_checkpointing.html#asynchronous-state-snapshots
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/
state_backends.html
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
checkpointing.html
[4] https://www.youtube.com/watch?
v=dWQ24wERItM&index=36&list=PLDX4T_cnKjD0JeULl1X6iTn7VIkDeYX_X

On Thursday, 21 September 2017 13:26:17 CEST Narendra Joshi wrote:
> Hi,
> 
> How are concurrent snapshots taken for an operator?
> Let's say an operator receives barriers for a checkpoint from all of
> its inputs. It triggers the checkpoint. Now, the checkpoint starts
> getting saved asynchronously. Before the checkpoint is acknowledged,
> the operator receives all barriers for all inputs for the next
> checkpoint. What will happen in this case if no concurrent checkpoints
> are allowed (i.e. the default value is used)? What will happen if
> concurrent checkpoints are allowed?
> 
> Thanks,
> Narendra Joshi