You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jose Delgado <jd...@bandainamcoent.com> on 2019/08/01 19:44:29 UTC

Save state on tear down

Hello All,

I wondering if there is a way or pattern to save state on tear down ?

In case of a failure or a pipeline we cannot update(due to significant changes ) we would like to save the state and re-load it on the next creation of the pipeline.

Note: we are currently using Google Dataflow runner

Regards,
Jose

Re: Save state on tear down

Posted by Lukasz Cwik <lc...@google.com>.
On Fri, Aug 30, 2019 at 6:52 PM Chad Dombrova <ch...@gmail.com> wrote:

> +dev
>
> I read the document on Drain, and it sounds very promising.  I have a few
> questions, starting with this statement from the doc:
>
>    "This document proposes a new pipeline action called Drain. Drain can
> be implemented by runners by manipulating the watermark of the pipeline."
>
> What is a pipeline "action" and how would this be exposed to the user?  I
> assume this is externally and manually initiated.  Is this something that
> would be invoked from a PipelineResult object (i.e. akin to "cancel")?
>

The term is alluding to job management (run, cancel, get metrics, ...) and
yes this should be exposed as part of PipelineResult in addition to any
tooling that performs job management.


>
> Once a Drain is initiated on a pipeline, does this trigger a loop over all
> unbounded sources to set their watermark to infinity, or only certain
> ones?
>

Conceptually yes, the runner sets the watermark to infinity for all
"sources" preventing new data from being produced. The watermark then
progresses downstream throughout the execution graph causing windows to
close, timers to fire, state to be emit and then garbage collected and so
forth.


>
> thanks,
> -chad
>
>
> On Fri, Aug 16, 2019 at 2:47 PM Jose Delgado <jd...@bandainamcoent.com>
> wrote:
>
>> I see,  thank you  Lukasz.
>>
>>
>>
>> Regards,
>> Jose
>>
>> *From: *Lukasz Cwik <lc...@google.com>
>> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
>> *Date: *Monday, August 5, 2019 at 11:11 AM
>> *To: *user <us...@beam.apache.org>
>> *Subject: *Re: Save state on tear down
>>
>>
>>
>> This is not possible today.
>>
>>
>>
>> There have been discussions about pipeline drain, snapshot and update [1,
>> 2] which may provide additional details of what is planned and could use
>> your feedback.
>>
>>
>>
>> 1:
>> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
>>
>> 2:
>> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>>
>>
>>
>> On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>
>> wrote:
>>
>> Hello All,
>>
>>
>>
>> I wondering if there is a way or pattern to save state on tear down ?
>>
>>
>>
>> In case of a failure or a pipeline we cannot update(due to significant
>> changes ) we would like to save the state and re-load it on the next
>> creation of the pipeline.
>>
>>
>>
>> Note: we are currently using Google Dataflow runner
>>
>>
>>
>> Regards,
>>
>> Jose
>>
>>

Re: Save state on tear down

Posted by Lukasz Cwik <lc...@google.com>.
On Fri, Aug 30, 2019 at 6:52 PM Chad Dombrova <ch...@gmail.com> wrote:

> +dev
>
> I read the document on Drain, and it sounds very promising.  I have a few
> questions, starting with this statement from the doc:
>
>    "This document proposes a new pipeline action called Drain. Drain can
> be implemented by runners by manipulating the watermark of the pipeline."
>
> What is a pipeline "action" and how would this be exposed to the user?  I
> assume this is externally and manually initiated.  Is this something that
> would be invoked from a PipelineResult object (i.e. akin to "cancel")?
>

The term is alluding to job management (run, cancel, get metrics, ...) and
yes this should be exposed as part of PipelineResult in addition to any
tooling that performs job management.


>
> Once a Drain is initiated on a pipeline, does this trigger a loop over all
> unbounded sources to set their watermark to infinity, or only certain
> ones?
>

Conceptually yes, the runner sets the watermark to infinity for all
"sources" preventing new data from being produced. The watermark then
progresses downstream throughout the execution graph causing windows to
close, timers to fire, state to be emit and then garbage collected and so
forth.


>
> thanks,
> -chad
>
>
> On Fri, Aug 16, 2019 at 2:47 PM Jose Delgado <jd...@bandainamcoent.com>
> wrote:
>
>> I see,  thank you  Lukasz.
>>
>>
>>
>> Regards,
>> Jose
>>
>> *From: *Lukasz Cwik <lc...@google.com>
>> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
>> *Date: *Monday, August 5, 2019 at 11:11 AM
>> *To: *user <us...@beam.apache.org>
>> *Subject: *Re: Save state on tear down
>>
>>
>>
>> This is not possible today.
>>
>>
>>
>> There have been discussions about pipeline drain, snapshot and update [1,
>> 2] which may provide additional details of what is planned and could use
>> your feedback.
>>
>>
>>
>> 1:
>> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
>>
>> 2:
>> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>>
>>
>>
>> On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>
>> wrote:
>>
>> Hello All,
>>
>>
>>
>> I wondering if there is a way or pattern to save state on tear down ?
>>
>>
>>
>> In case of a failure or a pipeline we cannot update(due to significant
>> changes ) we would like to save the state and re-load it on the next
>> creation of the pipeline.
>>
>>
>>
>> Note: we are currently using Google Dataflow runner
>>
>>
>>
>> Regards,
>>
>> Jose
>>
>>

Re: Save state on tear down

Posted by Chad Dombrova <ch...@gmail.com>.
+dev

I read the document on Drain, and it sounds very promising.  I have a few
questions, starting with this statement from the doc:

   "This document proposes a new pipeline action called Drain. Drain can be
implemented by runners by manipulating the watermark of the pipeline."

What is a pipeline "action" and how would this be exposed to the user?  I
assume this is externally and manually initiated.  Is this something that
would be invoked from a PipelineResult object (i.e. akin to "cancel")?

Once a Drain is initiated on a pipeline, does this trigger a loop over all
unbounded sources to set their watermark to infinity, or only certain
ones?

thanks,
-chad


On Fri, Aug 16, 2019 at 2:47 PM Jose Delgado <jd...@bandainamcoent.com>
wrote:

> I see,  thank you  Lukasz.
>
>
>
> Regards,
> Jose
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
> *Date: *Monday, August 5, 2019 at 11:11 AM
> *To: *user <us...@beam.apache.org>
> *Subject: *Re: Save state on tear down
>
>
>
> This is not possible today.
>
>
>
> There have been discussions about pipeline drain, snapshot and update [1,
> 2] which may provide additional details of what is planned and could use
> your feedback.
>
>
>
> 1:
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
>
> 2:
> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>
>
>
> On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>
> wrote:
>
> Hello All,
>
>
>
> I wondering if there is a way or pattern to save state on tear down ?
>
>
>
> In case of a failure or a pipeline we cannot update(due to significant
> changes ) we would like to save the state and re-load it on the next
> creation of the pipeline.
>
>
>
> Note: we are currently using Google Dataflow runner
>
>
>
> Regards,
>
> Jose
>
>

Re: Save state on tear down

Posted by Chad Dombrova <ch...@gmail.com>.
+dev

I read the document on Drain, and it sounds very promising.  I have a few
questions, starting with this statement from the doc:

   "This document proposes a new pipeline action called Drain. Drain can be
implemented by runners by manipulating the watermark of the pipeline."

What is a pipeline "action" and how would this be exposed to the user?  I
assume this is externally and manually initiated.  Is this something that
would be invoked from a PipelineResult object (i.e. akin to "cancel")?

Once a Drain is initiated on a pipeline, does this trigger a loop over all
unbounded sources to set their watermark to infinity, or only certain
ones?

thanks,
-chad


On Fri, Aug 16, 2019 at 2:47 PM Jose Delgado <jd...@bandainamcoent.com>
wrote:

> I see,  thank you  Lukasz.
>
>
>
> Regards,
> Jose
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
> *Date: *Monday, August 5, 2019 at 11:11 AM
> *To: *user <us...@beam.apache.org>
> *Subject: *Re: Save state on tear down
>
>
>
> This is not possible today.
>
>
>
> There have been discussions about pipeline drain, snapshot and update [1,
> 2] which may provide additional details of what is planned and could use
> your feedback.
>
>
>
> 1:
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
>
> 2:
> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>
>
>
> On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>
> wrote:
>
> Hello All,
>
>
>
> I wondering if there is a way or pattern to save state on tear down ?
>
>
>
> In case of a failure or a pipeline we cannot update(due to significant
> changes ) we would like to save the state and re-load it on the next
> creation of the pipeline.
>
>
>
> Note: we are currently using Google Dataflow runner
>
>
>
> Regards,
>
> Jose
>
>

Re: Save state on tear down

Posted by Jose Delgado <jd...@bandainamcoent.com>.
I see,  thank you  Lukasz.

Regards,
Jose
From: Lukasz Cwik <lc...@google.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, August 5, 2019 at 11:11 AM
To: user <us...@beam.apache.org>
Subject: Re: Save state on tear down

This is not possible today.

There have been discussions about pipeline drain, snapshot and update [1, 2] which may provide additional details of what is planned and could use your feedback.

1: https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
2: https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY

On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>> wrote:
Hello All,

I wondering if there is a way or pattern to save state on tear down ?

In case of a failure or a pipeline we cannot update(due to significant changes ) we would like to save the state and re-load it on the next creation of the pipeline.

Note: we are currently using Google Dataflow runner

Regards,
Jose

Re: Save state on tear down

Posted by Lukasz Cwik <lc...@google.com>.
This is not possible today.

There have been discussions about pipeline drain, snapshot and update [1,
2] which may provide additional details of what is planned and could use
your feedback.

1:
https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
2:
https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY

On Thu, Aug 1, 2019 at 3:44 PM Jose Delgado <jd...@bandainamcoent.com>
wrote:

> Hello All,
>
>
>
> I wondering if there is a way or pattern to save state on tear down ?
>
>
>
> In case of a failure or a pipeline we cannot update(due to significant
> changes ) we would like to save the state and re-load it on the next
> creation of the pipeline.
>
>
>
> Note: we are currently using Google Dataflow runner
>
>
>
> Regards,
>
> Jose
>