You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Sandesh Hegde <sa...@datatorrent.com> on 2016/08/04 07:10:05 UTC

[Proposal] Named Checkpoints

Hello Team,

This thread is to discuss the Named Checkpoint feature for Apex. (
https://issues.apache.org/jira/browse/APEXCORE-498)

Named checkpoints allow following workflow,

1. Users can trigger a checkpoint and give it a name
2. Relaunch the application from the named checkpoint.
3. These checkpoints survive the "purge of old checkpoints".

Current idea is to add a new control tuple, NamedCheckPointTuple, which
contains the user specified name, it traverses the DAG and along the way
necessary actions are taken.

Please let me know your thoughts on this.

Thanks

Re: [Proposal] Named Checkpoints

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Thanks for the review Tushar. Here are my answers,

1. Just storing the mapping from operator id to checkpoint is not enough
  Because "State" = libs + checkpoints ( libs can change in future
https://issues.apache.org/jira/browse/APEXCORE-232 )

2. Valid point about users using Storage Agents, there is no easy solution
for that
   a. StorageAgent interface doesn't support the method to copy the
checkpoints
   b. ApexCLI doesn't load StorageAgent
   c. SavePoint can be taken for the killed app, so can't rely on running
App to make the copy.

In the v1 we can have,
1. savepoint from HDFS
2. launching from savepoint, checkpoints could be on different Storage
Agents ( users need to make the copy ).

What do you think?

Thanks




On Wed, Aug 10, 2016 at 2:17 AM Tushar Gosavi <tu...@datatorrent.com>
wrote:

> The prototype implementation assume that checkpoints are always stored
> in HDFS, but user could implements their own
> storage agent. In this case this implementation may not work. The more
> useful approach would be to have a metadata file
> for each savepoint which stores operator id and checkpoint id. and
> prevent master from purging those checkpoints on commit.
> during restart the storage agent can get required checkpoint from its
> store, and which checkpoints to load will be available in
> savepoint metadata file.
>
> - Tushar.
>
>
>
> On Mon, Aug 8, 2016 at 8:49 PM, Sandesh Hegde <sa...@datatorrent.com>
> wrote:
> > The idea here was to create, on demand, recovery/committed window. But
> > there is always one(except before the first) recovery window for the DAG.
> > Instead of using/modifying the Checkpoint tuple, I am planning to reuse
> > the existing recovery window state, which simplifies the implementation.
> >
> > Proposed API:
> >
> > ApexCli> savepoint <appId> <folderToSaveTheState>
> > ApexCli> launch -savepoint <folderWithTheState>
> >
> > first prototype:
> >
> https://github.com/sandeshh/apex-core/commit/8ec7e837318c2b33289251cda78ece0024a3f895
> >
> > Thanks
> >
> > On Thu, Aug 4, 2016 at 11:54 AM Amol Kekre <am...@datatorrent.com> wrote:
> >
> >> hmm! actually it may be a good debugging tool too. Keep the named
> >> checkpoints around. The feature is to keep checkpoints around, which
> can be
> >> done by giving a feature to not delete checkpoints, but then naming them
> >> makes it more operational. Send a command from cli->get checkpoint ->
> know
> >> it is the one you need as the file name has your string you send with
> the
> >> command -> debug. This is different that querying a state as this gives
> >> entire app checkpoint to debug with.
> >>
> >> Thks
> >> Amol
> >>
> >>
> >> On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
> >> venkatesh@datatorrent.com> wrote:
> >>
> >> > + 1 for the idea.
> >> >
> >> > It might be helpful to developers as well when dealing with variety of
> >> > data in large volumes if this can help them run from the checkpointed
> >> state
> >> > rather than rerunning the application altogether in case of issues.
> >> >
> >> > I have seen cases where the application runs for more than 10 hours
> and
> >> > some partitions fail because of the variety of data that it is dealing
> >> > with. In such cases, the application has to be restarted and it will
> be
> >> > helpful to developers with a feature of this kind.
> >> >
> >> >  The ease of enabling/disabling this feature to run the app will also
> be
> >> > important.
> >> >
> >> > -Venkatesh.
> >> >
> >> >
> >> > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <am...@datatorrent.com>
> wrote:
> >> > >
> >> > > We had an user who wanted roll-back and restart from audit purposes.
> >> That
> >> > > time we did not have timed-window. Names checkpoint would have
> helped a
> >> > > little bit..
> >> > >
> >> > > Problem statement: Auditors ask for rerun of yesterday's
> computations
> >> for
> >> > > verification. Assume that these computations depend on previous
> state
> >> > (i.e
> >> > > data from day before yesterday).
> >> > >
> >> > > Solution
> >> > > 1. Have named checkpoints at 12 in the night (an input adapter
> triggers
> >> > it)
> >> > > every day
> >> > > 2. The app spools raw logs into hdfs along with window ids and event
> >> > times
> >> > > 3. The re-run is a separate app that starts off on a named
> checkpoint
> >> (12
> >> > > night yesterday)
> >> > >
> >> > > Technically the solution will not as simple and "new audit app" will
> >> > need a
> >> > > lot of other checks (dedups, drop events not in yesterday's window,
> >> wait
> >> > > for late arrivals, ...), but names checkpoint helps.
> >> > >
> >> > > I do agree with Pramod's that replay within the same running app is
> not
> >> > > viable within a data-in-motion architecture. But it helps somewhat
> in a
> >> > new
> >> > > audit app. Named checkpoints help data-in-motion architectures
> handle
> >> > batch
> >> > > apps better. In the above case #2 spooling done with event time
> >> > stamp+state
> >> > > suffices. The state part comes from names checkpoint.
> >> > >
> >> > > Thks,
> >> > > Amol
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <
> sanjay@datatorrent.com
> >> >
> >> > > wrote:
> >> > >
> >> > >> I agree. A specific use-case will be useful to support this
> feature.
> >> > Also
> >> > >> the ability to replay from the named checkpoint will be limited
> >> because
> >> > of
> >> > >> various factors, isn’t it?
> >> > >>
> >> > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com>
> wrote:
> >> > >>
> >> > >>    There is a problem here, keeping old checkpoints and recovering
> >> from
> >> > >> them
> >> > >>    means preserving the old input data along with the state. This
> is
> >> > more
> >> > >> than
> >> > >>    the mechanism of actually creating named checkpoints, it means
> >> having
> >> > >> the
> >> > >>    ability for operators to move forward (a.k.a committed and
> dropping
> >> > >>    committed states and buffer data) while still having the
> ability to
> >> > >> replay
> >> > >>    from that point from the input source and providing a way for
> >> > >> operators (at
> >> > >>    first look input operators) to distinguish that. Why would
> someone
> >> > need
> >> > >>    this with idempotent processing? Is there a specific use case
> you
> >> are
> >> > >>    looking at? Suppose we go do this, for the mechanism, I would
> be in
> >> > >> favor
> >> > >>    of reusing existing tuple.
> >> > >>
> >> > >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <
> >> v.rozov@datatorrent.com>
> >> > >> wrote:
> >> > >>
> >> > >>> +1 for the feature. At first look I am more in favor of reusing
> >> > >> existing
> >> > >>> control tuple.
> >> > >>>
> >> > >>> Thank you,
> >> > >>>
> >> > >>> Vlad
> >> > >>>
> >> > >>>
> >> > >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> >> > >>>
> >> > >>>> @Chinmay
> >> > >>>> We can enhance the existing checkpoint tuple but that one is more
> >> > >>>> frequently used than this feature, so why burden Checkpoint tuple
> >> > >> with
> >> > >>>> an extra field?
> >> > >>>>
> >> > >>>> @Aniruddha
> >> > >>>> It is better to leave the scheduling to the users, they can use
> any
> >> > >> tool
> >> > >>>> that they are already familiar with.
> >> > >>>>
> >> > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> >> > >>>> aniruddha@datatorrent.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>> +1 On the idea, it would be awesome to have.
> >> > >>>>>
> >> > >>>>> Question: Can we further develop this brilliant idea into:-
> >> > >>>>> Scheduled checkpoints ( To save as  dynamically named
> checkpoint)?
> >> > >>>>> This would be on the lines of logrotate / general backup
> >> > >> strategies.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Thanks,
> >> > >>>>>
> >> > >>>>> A
> >> > >>>>>
> >> > >>>>> _____________________________________
> >> > >>>>> Sent with difficulty, I mean handheld ;)
> >> > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ram@datatorrent.com
> >
> >> > >> wrote:
> >> > >>>>>
> >> > >>>>> +1
> >> > >>>>>>
> >> > >>>>>> Ram
> >> > >>>>>>
> >> > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> >> > >> sandesh@datatorrent.com
> >> > >>>>>>>
> >> > >>>>>> wrote:
> >> > >>>>>>
> >> > >>>>>> Hello Team,
> >> > >>>>>>>
> >> > >>>>>>> This thread is to discuss the Named Checkpoint feature for
> Apex.
> >> > >> (
> >> > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
> >> > >>>>>>>
> >> > >>>>>>> Named checkpoints allow following workflow,
> >> > >>>>>>>
> >> > >>>>>>> 1. Users can trigger a checkpoint and give it a name
> >> > >>>>>>> 2. Relaunch the application from the named checkpoint.
> >> > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
> >> > >>>>>>>
> >> > >>>>>>> Current idea is to add a new control tuple,
> >> > >> NamedCheckPointTuple, which
> >> > >>>>>>> contains the user specified name, it traverses the DAG and
> along
> >> > >> the
> >> > >>>>>>>
> >> > >>>>>> way
> >> > >>>>>
> >> > >>>>>> necessary actions are taken.
> >> > >>>>>>>
> >> > >>>>>>> Please let me know your thoughts on this.
> >> > >>>>>>>
> >> > >>>>>>> Thanks
> >> > >>>>>>>
> >> > >>>>>>>
> >> > >>>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> >
> >> >
> >>
>

Re: [Proposal] Named Checkpoints

Posted by Tushar Gosavi <tu...@datatorrent.com>.
The prototype implementation assume that checkpoints are always stored
in HDFS, but user could implements their own
storage agent. In this case this implementation may not work. The more
useful approach would be to have a metadata file
for each savepoint which stores operator id and checkpoint id. and
prevent master from purging those checkpoints on commit.
during restart the storage agent can get required checkpoint from its
store, and which checkpoints to load will be available in
savepoint metadata file.

- Tushar.



On Mon, Aug 8, 2016 at 8:49 PM, Sandesh Hegde <sa...@datatorrent.com> wrote:
> The idea here was to create, on demand, recovery/committed window. But
> there is always one(except before the first) recovery window for the DAG.
> Instead of using/modifying the Checkpoint tuple, I am planning to reuse
> the existing recovery window state, which simplifies the implementation.
>
> Proposed API:
>
> ApexCli> savepoint <appId> <folderToSaveTheState>
> ApexCli> launch -savepoint <folderWithTheState>
>
> first prototype:
> https://github.com/sandeshh/apex-core/commit/8ec7e837318c2b33289251cda78ece0024a3f895
>
> Thanks
>
> On Thu, Aug 4, 2016 at 11:54 AM Amol Kekre <am...@datatorrent.com> wrote:
>
>> hmm! actually it may be a good debugging tool too. Keep the named
>> checkpoints around. The feature is to keep checkpoints around, which can be
>> done by giving a feature to not delete checkpoints, but then naming them
>> makes it more operational. Send a command from cli->get checkpoint -> know
>> it is the one you need as the file name has your string you send with the
>> command -> debug. This is different that querying a state as this gives
>> entire app checkpoint to debug with.
>>
>> Thks
>> Amol
>>
>>
>> On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
>> venkatesh@datatorrent.com> wrote:
>>
>> > + 1 for the idea.
>> >
>> > It might be helpful to developers as well when dealing with variety of
>> > data in large volumes if this can help them run from the checkpointed
>> state
>> > rather than rerunning the application altogether in case of issues.
>> >
>> > I have seen cases where the application runs for more than 10 hours and
>> > some partitions fail because of the variety of data that it is dealing
>> > with. In such cases, the application has to be restarted and it will be
>> > helpful to developers with a feature of this kind.
>> >
>> >  The ease of enabling/disabling this feature to run the app will also be
>> > important.
>> >
>> > -Venkatesh.
>> >
>> >
>> > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <am...@datatorrent.com> wrote:
>> > >
>> > > We had an user who wanted roll-back and restart from audit purposes.
>> That
>> > > time we did not have timed-window. Names checkpoint would have helped a
>> > > little bit..
>> > >
>> > > Problem statement: Auditors ask for rerun of yesterday's computations
>> for
>> > > verification. Assume that these computations depend on previous state
>> > (i.e
>> > > data from day before yesterday).
>> > >
>> > > Solution
>> > > 1. Have named checkpoints at 12 in the night (an input adapter triggers
>> > it)
>> > > every day
>> > > 2. The app spools raw logs into hdfs along with window ids and event
>> > times
>> > > 3. The re-run is a separate app that starts off on a named checkpoint
>> (12
>> > > night yesterday)
>> > >
>> > > Technically the solution will not as simple and "new audit app" will
>> > need a
>> > > lot of other checks (dedups, drop events not in yesterday's window,
>> wait
>> > > for late arrivals, ...), but names checkpoint helps.
>> > >
>> > > I do agree with Pramod's that replay within the same running app is not
>> > > viable within a data-in-motion architecture. But it helps somewhat in a
>> > new
>> > > audit app. Named checkpoints help data-in-motion architectures handle
>> > batch
>> > > apps better. In the above case #2 spooling done with event time
>> > stamp+state
>> > > suffices. The state part comes from names checkpoint.
>> > >
>> > > Thks,
>> > > Amol
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sanjay@datatorrent.com
>> >
>> > > wrote:
>> > >
>> > >> I agree. A specific use-case will be useful to support this feature.
>> > Also
>> > >> the ability to replay from the named checkpoint will be limited
>> because
>> > of
>> > >> various factors, isn’t it?
>> > >>
>> > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:
>> > >>
>> > >>    There is a problem here, keeping old checkpoints and recovering
>> from
>> > >> them
>> > >>    means preserving the old input data along with the state. This is
>> > more
>> > >> than
>> > >>    the mechanism of actually creating named checkpoints, it means
>> having
>> > >> the
>> > >>    ability for operators to move forward (a.k.a committed and dropping
>> > >>    committed states and buffer data) while still having the ability to
>> > >> replay
>> > >>    from that point from the input source and providing a way for
>> > >> operators (at
>> > >>    first look input operators) to distinguish that. Why would someone
>> > need
>> > >>    this with idempotent processing? Is there a specific use case you
>> are
>> > >>    looking at? Suppose we go do this, for the mechanism, I would be in
>> > >> favor
>> > >>    of reusing existing tuple.
>> > >>
>> > >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <
>> v.rozov@datatorrent.com>
>> > >> wrote:
>> > >>
>> > >>> +1 for the feature. At first look I am more in favor of reusing
>> > >> existing
>> > >>> control tuple.
>> > >>>
>> > >>> Thank you,
>> > >>>
>> > >>> Vlad
>> > >>>
>> > >>>
>> > >>> On 8/4/16 08:17, Sandesh Hegde wrote:
>> > >>>
>> > >>>> @Chinmay
>> > >>>> We can enhance the existing checkpoint tuple but that one is more
>> > >>>> frequently used than this feature, so why burden Checkpoint tuple
>> > >> with
>> > >>>> an extra field?
>> > >>>>
>> > >>>> @Aniruddha
>> > >>>> It is better to leave the scheduling to the users, they can use any
>> > >> tool
>> > >>>> that they are already familiar with.
>> > >>>>
>> > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
>> > >>>> aniruddha@datatorrent.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>> +1 On the idea, it would be awesome to have.
>> > >>>>>
>> > >>>>> Question: Can we further develop this brilliant idea into:-
>> > >>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>> > >>>>> This would be on the lines of logrotate / general backup
>> > >> strategies.
>> > >>>>>
>> > >>>>>
>> > >>>>> Thanks,
>> > >>>>>
>> > >>>>> A
>> > >>>>>
>> > >>>>> _____________________________________
>> > >>>>> Sent with difficulty, I mean handheld ;)
>> > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com>
>> > >> wrote:
>> > >>>>>
>> > >>>>> +1
>> > >>>>>>
>> > >>>>>> Ram
>> > >>>>>>
>> > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
>> > >> sandesh@datatorrent.com
>> > >>>>>>>
>> > >>>>>> wrote:
>> > >>>>>>
>> > >>>>>> Hello Team,
>> > >>>>>>>
>> > >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex.
>> > >> (
>> > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>> > >>>>>>>
>> > >>>>>>> Named checkpoints allow following workflow,
>> > >>>>>>>
>> > >>>>>>> 1. Users can trigger a checkpoint and give it a name
>> > >>>>>>> 2. Relaunch the application from the named checkpoint.
>> > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
>> > >>>>>>>
>> > >>>>>>> Current idea is to add a new control tuple,
>> > >> NamedCheckPointTuple, which
>> > >>>>>>> contains the user specified name, it traverses the DAG and along
>> > >> the
>> > >>>>>>>
>> > >>>>>> way
>> > >>>>>
>> > >>>>>> necessary actions are taken.
>> > >>>>>>>
>> > >>>>>>> Please let me know your thoughts on this.
>> > >>>>>>>
>> > >>>>>>> Thanks
>> > >>>>>>>
>> > >>>>>>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>> >
>>

Re: [Proposal] Named Checkpoints

Posted by Sandesh Hegde <sa...@datatorrent.com>.
The idea here was to create, on demand, recovery/committed window. But
there is always one(except before the first) recovery window for the DAG.
Instead of using/modifying the Checkpoint tuple, I am planning to reuse
the existing recovery window state, which simplifies the implementation.

Proposed API:

ApexCli> savepoint <appId> <folderToSaveTheState>
ApexCli> launch -savepoint <folderWithTheState>

first prototype:
https://github.com/sandeshh/apex-core/commit/8ec7e837318c2b33289251cda78ece0024a3f895

Thanks

On Thu, Aug 4, 2016 at 11:54 AM Amol Kekre <am...@datatorrent.com> wrote:

> hmm! actually it may be a good debugging tool too. Keep the named
> checkpoints around. The feature is to keep checkpoints around, which can be
> done by giving a feature to not delete checkpoints, but then naming them
> makes it more operational. Send a command from cli->get checkpoint -> know
> it is the one you need as the file name has your string you send with the
> command -> debug. This is different that querying a state as this gives
> entire app checkpoint to debug with.
>
> Thks
> Amol
>
>
> On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
> venkatesh@datatorrent.com> wrote:
>
> > + 1 for the idea.
> >
> > It might be helpful to developers as well when dealing with variety of
> > data in large volumes if this can help them run from the checkpointed
> state
> > rather than rerunning the application altogether in case of issues.
> >
> > I have seen cases where the application runs for more than 10 hours and
> > some partitions fail because of the variety of data that it is dealing
> > with. In such cases, the application has to be restarted and it will be
> > helpful to developers with a feature of this kind.
> >
> >  The ease of enabling/disabling this feature to run the app will also be
> > important.
> >
> > -Venkatesh.
> >
> >
> > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <am...@datatorrent.com> wrote:
> > >
> > > We had an user who wanted roll-back and restart from audit purposes.
> That
> > > time we did not have timed-window. Names checkpoint would have helped a
> > > little bit..
> > >
> > > Problem statement: Auditors ask for rerun of yesterday's computations
> for
> > > verification. Assume that these computations depend on previous state
> > (i.e
> > > data from day before yesterday).
> > >
> > > Solution
> > > 1. Have named checkpoints at 12 in the night (an input adapter triggers
> > it)
> > > every day
> > > 2. The app spools raw logs into hdfs along with window ids and event
> > times
> > > 3. The re-run is a separate app that starts off on a named checkpoint
> (12
> > > night yesterday)
> > >
> > > Technically the solution will not as simple and "new audit app" will
> > need a
> > > lot of other checks (dedups, drop events not in yesterday's window,
> wait
> > > for late arrivals, ...), but names checkpoint helps.
> > >
> > > I do agree with Pramod's that replay within the same running app is not
> > > viable within a data-in-motion architecture. But it helps somewhat in a
> > new
> > > audit app. Named checkpoints help data-in-motion architectures handle
> > batch
> > > apps better. In the above case #2 spooling done with event time
> > stamp+state
> > > suffices. The state part comes from names checkpoint.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > >
> > >
> > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sanjay@datatorrent.com
> >
> > > wrote:
> > >
> > >> I agree. A specific use-case will be useful to support this feature.
> > Also
> > >> the ability to replay from the named checkpoint will be limited
> because
> > of
> > >> various factors, isn’t it?
> > >>
> > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:
> > >>
> > >>    There is a problem here, keeping old checkpoints and recovering
> from
> > >> them
> > >>    means preserving the old input data along with the state. This is
> > more
> > >> than
> > >>    the mechanism of actually creating named checkpoints, it means
> having
> > >> the
> > >>    ability for operators to move forward (a.k.a committed and dropping
> > >>    committed states and buffer data) while still having the ability to
> > >> replay
> > >>    from that point from the input source and providing a way for
> > >> operators (at
> > >>    first look input operators) to distinguish that. Why would someone
> > need
> > >>    this with idempotent processing? Is there a specific use case you
> are
> > >>    looking at? Suppose we go do this, for the mechanism, I would be in
> > >> favor
> > >>    of reusing existing tuple.
> > >>
> > >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <
> v.rozov@datatorrent.com>
> > >> wrote:
> > >>
> > >>> +1 for the feature. At first look I am more in favor of reusing
> > >> existing
> > >>> control tuple.
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Vlad
> > >>>
> > >>>
> > >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> > >>>
> > >>>> @Chinmay
> > >>>> We can enhance the existing checkpoint tuple but that one is more
> > >>>> frequently used than this feature, so why burden Checkpoint tuple
> > >> with
> > >>>> an extra field?
> > >>>>
> > >>>> @Aniruddha
> > >>>> It is better to leave the scheduling to the users, they can use any
> > >> tool
> > >>>> that they are already familiar with.
> > >>>>
> > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> > >>>> aniruddha@datatorrent.com>
> > >>>> wrote:
> > >>>>
> > >>>> +1 On the idea, it would be awesome to have.
> > >>>>>
> > >>>>> Question: Can we further develop this brilliant idea into:-
> > >>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> > >>>>> This would be on the lines of logrotate / general backup
> > >> strategies.
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> A
> > >>>>>
> > >>>>> _____________________________________
> > >>>>> Sent with difficulty, I mean handheld ;)
> > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com>
> > >> wrote:
> > >>>>>
> > >>>>> +1
> > >>>>>>
> > >>>>>> Ram
> > >>>>>>
> > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> > >> sandesh@datatorrent.com
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hello Team,
> > >>>>>>>
> > >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex.
> > >> (
> > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
> > >>>>>>>
> > >>>>>>> Named checkpoints allow following workflow,
> > >>>>>>>
> > >>>>>>> 1. Users can trigger a checkpoint and give it a name
> > >>>>>>> 2. Relaunch the application from the named checkpoint.
> > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
> > >>>>>>>
> > >>>>>>> Current idea is to add a new control tuple,
> > >> NamedCheckPointTuple, which
> > >>>>>>> contains the user specified name, it traverses the DAG and along
> > >> the
> > >>>>>>>
> > >>>>>> way
> > >>>>>
> > >>>>>> necessary actions are taken.
> > >>>>>>>
> > >>>>>>> Please let me know your thoughts on this.
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>>
> > >>>>>>>
> > >>>
> > >>
> > >>
> > >>
> > >>
> >
> >
>

Re: [Proposal] Named Checkpoints

Posted by Amol Kekre <am...@datatorrent.com>.
hmm! actually it may be a good debugging tool too. Keep the named
checkpoints around. The feature is to keep checkpoints around, which can be
done by giving a feature to not delete checkpoints, but then naming them
makes it more operational. Send a command from cli->get checkpoint -> know
it is the one you need as the file name has your string you send with the
command -> debug. This is different that querying a state as this gives
entire app checkpoint to debug with.

Thks
Amol


On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
venkatesh@datatorrent.com> wrote:

> + 1 for the idea.
>
> It might be helpful to developers as well when dealing with variety of
> data in large volumes if this can help them run from the checkpointed state
> rather than rerunning the application altogether in case of issues.
>
> I have seen cases where the application runs for more than 10 hours and
> some partitions fail because of the variety of data that it is dealing
> with. In such cases, the application has to be restarted and it will be
> helpful to developers with a feature of this kind.
>
>  The ease of enabling/disabling this feature to run the app will also be
> important.
>
> -Venkatesh.
>
>
> > On Aug 4, 2016, at 10:29 AM, Amol Kekre <am...@datatorrent.com> wrote:
> >
> > We had an user who wanted roll-back and restart from audit purposes. That
> > time we did not have timed-window. Names checkpoint would have helped a
> > little bit..
> >
> > Problem statement: Auditors ask for rerun of yesterday's computations for
> > verification. Assume that these computations depend on previous state
> (i.e
> > data from day before yesterday).
> >
> > Solution
> > 1. Have named checkpoints at 12 in the night (an input adapter triggers
> it)
> > every day
> > 2. The app spools raw logs into hdfs along with window ids and event
> times
> > 3. The re-run is a separate app that starts off on a named checkpoint (12
> > night yesterday)
> >
> > Technically the solution will not as simple and "new audit app" will
> need a
> > lot of other checks (dedups, drop events not in yesterday's window, wait
> > for late arrivals, ...), but names checkpoint helps.
> >
> > I do agree with Pramod's that replay within the same running app is not
> > viable within a data-in-motion architecture. But it helps somewhat in a
> new
> > audit app. Named checkpoints help data-in-motion architectures handle
> batch
> > apps better. In the above case #2 spooling done with event time
> stamp+state
> > suffices. The state part comes from names checkpoint.
> >
> > Thks,
> > Amol
> >
> >
> >
> >
> > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sa...@datatorrent.com>
> > wrote:
> >
> >> I agree. A specific use-case will be useful to support this feature.
> Also
> >> the ability to replay from the named checkpoint will be limited because
> of
> >> various factors, isn’t it?
> >>
> >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:
> >>
> >>    There is a problem here, keeping old checkpoints and recovering from
> >> them
> >>    means preserving the old input data along with the state. This is
> more
> >> than
> >>    the mechanism of actually creating named checkpoints, it means having
> >> the
> >>    ability for operators to move forward (a.k.a committed and dropping
> >>    committed states and buffer data) while still having the ability to
> >> replay
> >>    from that point from the input source and providing a way for
> >> operators (at
> >>    first look input operators) to distinguish that. Why would someone
> need
> >>    this with idempotent processing? Is there a specific use case you are
> >>    looking at? Suppose we go do this, for the mechanism, I would be in
> >> favor
> >>    of reusing existing tuple.
> >>
> >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v....@datatorrent.com>
> >> wrote:
> >>
> >>> +1 for the feature. At first look I am more in favor of reusing
> >> existing
> >>> control tuple.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> >>>
> >>>> @Chinmay
> >>>> We can enhance the existing checkpoint tuple but that one is more
> >>>> frequently used than this feature, so why burden Checkpoint tuple
> >> with
> >>>> an extra field?
> >>>>
> >>>> @Aniruddha
> >>>> It is better to leave the scheduling to the users, they can use any
> >> tool
> >>>> that they are already familiar with.
> >>>>
> >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> >>>> aniruddha@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> +1 On the idea, it would be awesome to have.
> >>>>>
> >>>>> Question: Can we further develop this brilliant idea into:-
> >>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> >>>>> This would be on the lines of logrotate / general backup
> >> strategies.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> A
> >>>>>
> >>>>> _____________________________________
> >>>>> Sent with difficulty, I mean handheld ;)
> >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com>
> >> wrote:
> >>>>>
> >>>>> +1
> >>>>>>
> >>>>>> Ram
> >>>>>>
> >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> >> sandesh@datatorrent.com
> >>>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello Team,
> >>>>>>>
> >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex.
> >> (
> >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
> >>>>>>>
> >>>>>>> Named checkpoints allow following workflow,
> >>>>>>>
> >>>>>>> 1. Users can trigger a checkpoint and give it a name
> >>>>>>> 2. Relaunch the application from the named checkpoint.
> >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
> >>>>>>>
> >>>>>>> Current idea is to add a new control tuple,
> >> NamedCheckPointTuple, which
> >>>>>>> contains the user specified name, it traverses the DAG and along
> >> the
> >>>>>>>
> >>>>>> way
> >>>>>
> >>>>>> necessary actions are taken.
> >>>>>>>
> >>>>>>> Please let me know your thoughts on this.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
> >>>
> >>
> >>
> >>
> >>
>
>

Re: [Proposal] Named Checkpoints

Posted by Venkatesh Kottapalli <ve...@datatorrent.com>.
+ 1 for the idea.

It might be helpful to developers as well when dealing with variety of data in large volumes if this can help them run from the checkpointed state rather than rerunning the application altogether in case of issues. 

I have seen cases where the application runs for more than 10 hours and some partitions fail because of the variety of data that it is dealing with. In such cases, the application has to be restarted and it will be helpful to developers with a feature of this kind.

 The ease of enabling/disabling this feature to run the app will also be important.

-Venkatesh.


> On Aug 4, 2016, at 10:29 AM, Amol Kekre <am...@datatorrent.com> wrote:
> 
> We had an user who wanted roll-back and restart from audit purposes. That
> time we did not have timed-window. Names checkpoint would have helped a
> little bit..
> 
> Problem statement: Auditors ask for rerun of yesterday's computations for
> verification. Assume that these computations depend on previous state (i.e
> data from day before yesterday).
> 
> Solution
> 1. Have named checkpoints at 12 in the night (an input adapter triggers it)
> every day
> 2. The app spools raw logs into hdfs along with window ids and event times
> 3. The re-run is a separate app that starts off on a named checkpoint (12
> night yesterday)
> 
> Technically the solution will not as simple and "new audit app" will need a
> lot of other checks (dedups, drop events not in yesterday's window, wait
> for late arrivals, ...), but names checkpoint helps.
> 
> I do agree with Pramod's that replay within the same running app is not
> viable within a data-in-motion architecture. But it helps somewhat in a new
> audit app. Named checkpoints help data-in-motion architectures handle batch
> apps better. In the above case #2 spooling done with event time stamp+state
> suffices. The state part comes from names checkpoint.
> 
> Thks,
> Amol
> 
> 
> 
> 
> On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sa...@datatorrent.com>
> wrote:
> 
>> I agree. A specific use-case will be useful to support this feature. Also
>> the ability to replay from the named checkpoint will be limited because of
>> various factors, isn’t it?
>> 
>> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:
>> 
>>    There is a problem here, keeping old checkpoints and recovering from
>> them
>>    means preserving the old input data along with the state. This is more
>> than
>>    the mechanism of actually creating named checkpoints, it means having
>> the
>>    ability for operators to move forward (a.k.a committed and dropping
>>    committed states and buffer data) while still having the ability to
>> replay
>>    from that point from the input source and providing a way for
>> operators (at
>>    first look input operators) to distinguish that. Why would someone need
>>    this with idempotent processing? Is there a specific use case you are
>>    looking at? Suppose we go do this, for the mechanism, I would be in
>> favor
>>    of reusing existing tuple.
>> 
>>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>> 
>>> +1 for the feature. At first look I am more in favor of reusing
>> existing
>>> control tuple.
>>> 
>>> Thank you,
>>> 
>>> Vlad
>>> 
>>> 
>>> On 8/4/16 08:17, Sandesh Hegde wrote:
>>> 
>>>> @Chinmay
>>>> We can enhance the existing checkpoint tuple but that one is more
>>>> frequently used than this feature, so why burden Checkpoint tuple
>> with
>>>> an extra field?
>>>> 
>>>> @Aniruddha
>>>> It is better to leave the scheduling to the users, they can use any
>> tool
>>>> that they are already familiar with.
>>>> 
>>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
>>>> aniruddha@datatorrent.com>
>>>> wrote:
>>>> 
>>>> +1 On the idea, it would be awesome to have.
>>>>> 
>>>>> Question: Can we further develop this brilliant idea into:-
>>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>>>>> This would be on the lines of logrotate / general backup
>> strategies.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> A
>>>>> 
>>>>> _____________________________________
>>>>> Sent with difficulty, I mean handheld ;)
>>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com>
>> wrote:
>>>>> 
>>>>> +1
>>>>>> 
>>>>>> Ram
>>>>>> 
>>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
>> sandesh@datatorrent.com
>>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hello Team,
>>>>>>> 
>>>>>>> This thread is to discuss the Named Checkpoint feature for Apex.
>> (
>>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>>>>>>> 
>>>>>>> Named checkpoints allow following workflow,
>>>>>>> 
>>>>>>> 1. Users can trigger a checkpoint and give it a name
>>>>>>> 2. Relaunch the application from the named checkpoint.
>>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
>>>>>>> 
>>>>>>> Current idea is to add a new control tuple,
>> NamedCheckPointTuple, which
>>>>>>> contains the user specified name, it traverses the DAG and along
>> the
>>>>>>> 
>>>>>> way
>>>>> 
>>>>>> necessary actions are taken.
>>>>>>> 
>>>>>>> Please let me know your thoughts on this.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> 
>>> 
>> 
>> 
>> 
>> 


Re: [Proposal] Named Checkpoints

Posted by Amol Kekre <am...@datatorrent.com>.
We had an user who wanted roll-back and restart from audit purposes. That
time we did not have timed-window. Names checkpoint would have helped a
little bit..

Problem statement: Auditors ask for rerun of yesterday's computations for
verification. Assume that these computations depend on previous state (i.e
data from day before yesterday).

Solution
1. Have named checkpoints at 12 in the night (an input adapter triggers it)
every day
2. The app spools raw logs into hdfs along with window ids and event times
3. The re-run is a separate app that starts off on a named checkpoint (12
night yesterday)

Technically the solution will not as simple and "new audit app" will need a
lot of other checks (dedups, drop events not in yesterday's window, wait
for late arrivals, ...), but names checkpoint helps.

I do agree with Pramod's that replay within the same running app is not
viable within a data-in-motion architecture. But it helps somewhat in a new
audit app. Named checkpoints help data-in-motion architectures handle batch
apps better. In the above case #2 spooling done with event time stamp+state
suffices. The state part comes from names checkpoint.

Thks,
Amol




On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sa...@datatorrent.com>
wrote:

> I agree. A specific use-case will be useful to support this feature. Also
> the ability to replay from the named checkpoint will be limited because of
> various factors, isn’t it?
>
> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:
>
>     There is a problem here, keeping old checkpoints and recovering from
> them
>     means preserving the old input data along with the state. This is more
> than
>     the mechanism of actually creating named checkpoints, it means having
> the
>     ability for operators to move forward (a.k.a committed and dropping
>     committed states and buffer data) while still having the ability to
> replay
>     from that point from the input source and providing a way for
> operators (at
>     first look input operators) to distinguish that. Why would someone need
>     this with idempotent processing? Is there a specific use case you are
>     looking at? Suppose we go do this, for the mechanism, I would be in
> favor
>     of reusing existing tuple.
>
>     On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v....@datatorrent.com>
> wrote:
>
>     > +1 for the feature. At first look I am more in favor of reusing
> existing
>     > control tuple.
>     >
>     > Thank you,
>     >
>     > Vlad
>     >
>     >
>     > On 8/4/16 08:17, Sandesh Hegde wrote:
>     >
>     >> @Chinmay
>     >> We can enhance the existing checkpoint tuple but that one is more
>     >> frequently used than this feature, so why burden Checkpoint tuple
> with
>     >> an extra field?
>     >>
>     >> @Aniruddha
>     >> It is better to leave the scheduling to the users, they can use any
> tool
>     >> that they are already familiar with.
>     >>
>     >> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
>     >> aniruddha@datatorrent.com>
>     >> wrote:
>     >>
>     >> +1 On the idea, it would be awesome to have.
>     >>>
>     >>> Question: Can we further develop this brilliant idea into:-
>     >>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>     >>> This would be on the lines of logrotate / general backup
> strategies.
>     >>>
>     >>>
>     >>> Thanks,
>     >>>
>     >>> A
>     >>>
>     >>> _____________________________________
>     >>> Sent with difficulty, I mean handheld ;)
>     >>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com>
> wrote:
>     >>>
>     >>> +1
>     >>>>
>     >>>> Ram
>     >>>>
>     >>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> sandesh@datatorrent.com
>     >>>> >
>     >>>> wrote:
>     >>>>
>     >>>> Hello Team,
>     >>>>>
>     >>>>> This thread is to discuss the Named Checkpoint feature for Apex.
> (
>     >>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>     >>>>>
>     >>>>> Named checkpoints allow following workflow,
>     >>>>>
>     >>>>> 1. Users can trigger a checkpoint and give it a name
>     >>>>> 2. Relaunch the application from the named checkpoint.
>     >>>>> 3. These checkpoints survive the "purge of old checkpoints".
>     >>>>>
>     >>>>> Current idea is to add a new control tuple,
> NamedCheckPointTuple, which
>     >>>>> contains the user specified name, it traverses the DAG and along
> the
>     >>>>>
>     >>>> way
>     >>>
>     >>>> necessary actions are taken.
>     >>>>>
>     >>>>> Please let me know your thoughts on this.
>     >>>>>
>     >>>>> Thanks
>     >>>>>
>     >>>>>
>     >
>
>
>
>

Re: [Proposal] Named Checkpoints

Posted by Sanjay Pujare <sa...@datatorrent.com>.
I agree. A specific use-case will be useful to support this feature. Also the ability to replay from the named checkpoint will be limited because of various factors, isn’t it?

On 8/4/16, 9:00 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:

    There is a problem here, keeping old checkpoints and recovering from them
    means preserving the old input data along with the state. This is more than
    the mechanism of actually creating named checkpoints, it means having the
    ability for operators to move forward (a.k.a committed and dropping
    committed states and buffer data) while still having the ability to replay
    from that point from the input source and providing a way for operators (at
    first look input operators) to distinguish that. Why would someone need
    this with idempotent processing? Is there a specific use case you are
    looking at? Suppose we go do this, for the mechanism, I would be in favor
    of reusing existing tuple.
    
    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v....@datatorrent.com> wrote:
    
    > +1 for the feature. At first look I am more in favor of reusing existing
    > control tuple.
    >
    > Thank you,
    >
    > Vlad
    >
    >
    > On 8/4/16 08:17, Sandesh Hegde wrote:
    >
    >> @Chinmay
    >> We can enhance the existing checkpoint tuple but that one is more
    >> frequently used than this feature, so why burden Checkpoint tuple with
    >> an extra field?
    >>
    >> @Aniruddha
    >> It is better to leave the scheduling to the users, they can use any tool
    >> that they are already familiar with.
    >>
    >> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
    >> aniruddha@datatorrent.com>
    >> wrote:
    >>
    >> +1 On the idea, it would be awesome to have.
    >>>
    >>> Question: Can we further develop this brilliant idea into:-
    >>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
    >>> This would be on the lines of logrotate / general backup strategies.
    >>>
    >>>
    >>> Thanks,
    >>>
    >>> A
    >>>
    >>> _____________________________________
    >>> Sent with difficulty, I mean handheld ;)
    >>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:
    >>>
    >>> +1
    >>>>
    >>>> Ram
    >>>>
    >>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sandesh@datatorrent.com
    >>>> >
    >>>> wrote:
    >>>>
    >>>> Hello Team,
    >>>>>
    >>>>> This thread is to discuss the Named Checkpoint feature for Apex. (
    >>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
    >>>>>
    >>>>> Named checkpoints allow following workflow,
    >>>>>
    >>>>> 1. Users can trigger a checkpoint and give it a name
    >>>>> 2. Relaunch the application from the named checkpoint.
    >>>>> 3. These checkpoints survive the "purge of old checkpoints".
    >>>>>
    >>>>> Current idea is to add a new control tuple, NamedCheckPointTuple, which
    >>>>> contains the user specified name, it traverses the DAG and along the
    >>>>>
    >>>> way
    >>>
    >>>> necessary actions are taken.
    >>>>>
    >>>>> Please let me know your thoughts on this.
    >>>>>
    >>>>> Thanks
    >>>>>
    >>>>>
    >
    



Re: [Proposal] Named Checkpoints

Posted by Pramod Immaneni <pr...@datatorrent.com>.
There is a problem here, keeping old checkpoints and recovering from them
means preserving the old input data along with the state. This is more than
the mechanism of actually creating named checkpoints, it means having the
ability for operators to move forward (a.k.a committed and dropping
committed states and buffer data) while still having the ability to replay
from that point from the input source and providing a way for operators (at
first look input operators) to distinguish that. Why would someone need
this with idempotent processing? Is there a specific use case you are
looking at? Suppose we go do this, for the mechanism, I would be in favor
of reusing existing tuple.

On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> +1 for the feature. At first look I am more in favor of reusing existing
> control tuple.
>
> Thank you,
>
> Vlad
>
>
> On 8/4/16 08:17, Sandesh Hegde wrote:
>
>> @Chinmay
>> We can enhance the existing checkpoint tuple but that one is more
>> frequently used than this feature, so why burden Checkpoint tuple with
>> an extra field?
>>
>> @Aniruddha
>> It is better to leave the scheduling to the users, they can use any tool
>> that they are already familiar with.
>>
>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
>> aniruddha@datatorrent.com>
>> wrote:
>>
>> +1 On the idea, it would be awesome to have.
>>>
>>> Question: Can we further develop this brilliant idea into:-
>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>>> This would be on the lines of logrotate / general backup strategies.
>>>
>>>
>>> Thanks,
>>>
>>> A
>>>
>>> _____________________________________
>>> Sent with difficulty, I mean handheld ;)
>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:
>>>
>>> +1
>>>>
>>>> Ram
>>>>
>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sandesh@datatorrent.com
>>>> >
>>>> wrote:
>>>>
>>>> Hello Team,
>>>>>
>>>>> This thread is to discuss the Named Checkpoint feature for Apex. (
>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>>>>>
>>>>> Named checkpoints allow following workflow,
>>>>>
>>>>> 1. Users can trigger a checkpoint and give it a name
>>>>> 2. Relaunch the application from the named checkpoint.
>>>>> 3. These checkpoints survive the "purge of old checkpoints".
>>>>>
>>>>> Current idea is to add a new control tuple, NamedCheckPointTuple, which
>>>>> contains the user specified name, it traverses the DAG and along the
>>>>>
>>>> way
>>>
>>>> necessary actions are taken.
>>>>>
>>>>> Please let me know your thoughts on this.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>

Re: [Proposal] Named Checkpoints

Posted by Vlad Rozov <v....@datatorrent.com>.
+1 for the feature. At first look I am more in favor of reusing existing 
control tuple.

Thank you,

Vlad

On 8/4/16 08:17, Sandesh Hegde wrote:
> @Chinmay
> We can enhance the existing checkpoint tuple but that one is more
> frequently used than this feature, so why burden Checkpoint tuple with
> an extra field?
>
> @Aniruddha
> It is better to leave the scheduling to the users, they can use any tool
> that they are already familiar with.
>
> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <an...@datatorrent.com>
> wrote:
>
>> +1 On the idea, it would be awesome to have.
>>
>> Question: Can we further develop this brilliant idea into:-
>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>> This would be on the lines of logrotate / general backup strategies.
>>
>>
>> Thanks,
>>
>> A
>>
>> _____________________________________
>> Sent with difficulty, I mean handheld ;)
>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:
>>
>>> +1
>>>
>>> Ram
>>>
>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sa...@datatorrent.com>
>>> wrote:
>>>
>>>> Hello Team,
>>>>
>>>> This thread is to discuss the Named Checkpoint feature for Apex. (
>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>>>>
>>>> Named checkpoints allow following workflow,
>>>>
>>>> 1. Users can trigger a checkpoint and give it a name
>>>> 2. Relaunch the application from the named checkpoint.
>>>> 3. These checkpoints survive the "purge of old checkpoints".
>>>>
>>>> Current idea is to add a new control tuple, NamedCheckPointTuple, which
>>>> contains the user specified name, it traverses the DAG and along the
>> way
>>>> necessary actions are taken.
>>>>
>>>> Please let me know your thoughts on this.
>>>>
>>>> Thanks
>>>>


Re: [Proposal] Named Checkpoints

Posted by Aniruddha Thombare <an...@datatorrent.com>.
Sandesh, Agreed.
There should be external APIs to acces the feature, if we want to integrate
it to 3rd party tools.

Thanks,

A

_____________________________________
Sent with difficulty, I mean handheld ;)
On 4 Aug 2016 8:47 pm, "Sandesh Hegde" <sa...@datatorrent.com> wrote:

> @Chinmay
> We can enhance the existing checkpoint tuple but that one is more
> frequently used than this feature, so why burden Checkpoint tuple with
> an extra field?
>
> @Aniruddha
> It is better to leave the scheduling to the users, they can use any tool
> that they are already familiar with.
>
> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> aniruddha@datatorrent.com>
> wrote:
>
> > +1 On the idea, it would be awesome to have.
> >
> > Question: Can we further develop this brilliant idea into:-
> > Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> > This would be on the lines of logrotate / general backup strategies.
> >
> >
> > Thanks,
> >
> > A
> >
> > _____________________________________
> > Sent with difficulty, I mean handheld ;)
> > On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:
> >
> > > +1
> > >
> > > Ram
> > >
> > > On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> sandesh@datatorrent.com>
> > > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > This thread is to discuss the Named Checkpoint feature for Apex. (
> > > > https://issues.apache.org/jira/browse/APEXCORE-498)
> > > >
> > > > Named checkpoints allow following workflow,
> > > >
> > > > 1. Users can trigger a checkpoint and give it a name
> > > > 2. Relaunch the application from the named checkpoint.
> > > > 3. These checkpoints survive the "purge of old checkpoints".
> > > >
> > > > Current idea is to add a new control tuple, NamedCheckPointTuple,
> which
> > > > contains the user specified name, it traverses the DAG and along the
> > way
> > > > necessary actions are taken.
> > > >
> > > > Please let me know your thoughts on this.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: [Proposal] Named Checkpoints

Posted by Sandesh Hegde <sa...@datatorrent.com>.
@Chinmay
We can enhance the existing checkpoint tuple but that one is more
frequently used than this feature, so why burden Checkpoint tuple with
an extra field?

@Aniruddha
It is better to leave the scheduling to the users, they can use any tool
that they are already familiar with.

On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <an...@datatorrent.com>
wrote:

> +1 On the idea, it would be awesome to have.
>
> Question: Can we further develop this brilliant idea into:-
> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> This would be on the lines of logrotate / general backup strategies.
>
>
> Thanks,
>
> A
>
> _____________________________________
> Sent with difficulty, I mean handheld ;)
> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:
>
> > +1
> >
> > Ram
> >
> > On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sa...@datatorrent.com>
> > wrote:
> >
> > > Hello Team,
> > >
> > > This thread is to discuss the Named Checkpoint feature for Apex. (
> > > https://issues.apache.org/jira/browse/APEXCORE-498)
> > >
> > > Named checkpoints allow following workflow,
> > >
> > > 1. Users can trigger a checkpoint and give it a name
> > > 2. Relaunch the application from the named checkpoint.
> > > 3. These checkpoints survive the "purge of old checkpoints".
> > >
> > > Current idea is to add a new control tuple, NamedCheckPointTuple, which
> > > contains the user specified name, it traverses the DAG and along the
> way
> > > necessary actions are taken.
> > >
> > > Please let me know your thoughts on this.
> > >
> > > Thanks
> > >
> >
>

Re: [Proposal] Named Checkpoints

Posted by Aniruddha Thombare <an...@datatorrent.com>.
+1 On the idea, it would be awesome to have.

Question: Can we further develop this brilliant idea into:-
Scheduled checkpoints ( To save as  dynamically named checkpoint)?
This would be on the lines of logrotate / general backup strategies.


Thanks,

A

_____________________________________
Sent with difficulty, I mean handheld ;)
On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ra...@datatorrent.com> wrote:

> +1
>
> Ram
>
> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sa...@datatorrent.com>
> wrote:
>
> > Hello Team,
> >
> > This thread is to discuss the Named Checkpoint feature for Apex. (
> > https://issues.apache.org/jira/browse/APEXCORE-498)
> >
> > Named checkpoints allow following workflow,
> >
> > 1. Users can trigger a checkpoint and give it a name
> > 2. Relaunch the application from the named checkpoint.
> > 3. These checkpoints survive the "purge of old checkpoints".
> >
> > Current idea is to add a new control tuple, NamedCheckPointTuple, which
> > contains the user specified name, it traverses the DAG and along the way
> > necessary actions are taken.
> >
> > Please let me know your thoughts on this.
> >
> > Thanks
> >
>

Re: [Proposal] Named Checkpoints

Posted by Munagala Ramanath <ra...@datatorrent.com>.
+1

Ram

On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <sa...@datatorrent.com>
wrote:

> Hello Team,
>
> This thread is to discuss the Named Checkpoint feature for Apex. (
> https://issues.apache.org/jira/browse/APEXCORE-498)
>
> Named checkpoints allow following workflow,
>
> 1. Users can trigger a checkpoint and give it a name
> 2. Relaunch the application from the named checkpoint.
> 3. These checkpoints survive the "purge of old checkpoints".
>
> Current idea is to add a new control tuple, NamedCheckPointTuple, which
> contains the user specified name, it traverses the DAG and along the way
> necessary actions are taken.
>
> Please let me know your thoughts on this.
>
> Thanks
>

Re: [Proposal] Named Checkpoints

Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.
Nice feature. +1 for it.

One question, instead of creating a new tuple type, can this be done by
modifying current checkpoint tuple?



On Thu, Aug 4, 2016 at 2:02 PM, Yogi Devendra <yo...@apache.org>
wrote:

> This will be awesome feature. I can see the usecases for production
> scenario as well as developement/troubleshooting environments.
>
> Excellent value add.
>
>
>
> ~ Yogi
>
> On 4 August 2016 at 12:40, Sandesh Hegde <sa...@datatorrent.com> wrote:
>
> > Hello Team,
> >
> > This thread is to discuss the Named Checkpoint feature for Apex. (
> > https://issues.apache.org/jira/browse/APEXCORE-498)
> >
> > Named checkpoints allow following workflow,
> >
> > 1. Users can trigger a checkpoint and give it a name
> > 2. Relaunch the application from the named checkpoint.
> > 3. These checkpoints survive the "purge of old checkpoints".
> >
> > Current idea is to add a new control tuple, NamedCheckPointTuple, which
> > contains the user specified name, it traverses the DAG and along the way
> > necessary actions are taken.
> >
> > Please let me know your thoughts on this.
> >
> > Thanks
> >
>

Re: [Proposal] Named Checkpoints

Posted by Yogi Devendra <yo...@apache.org>.
This will be awesome feature. I can see the usecases for production
scenario as well as developement/troubleshooting environments.

Excellent value add.



~ Yogi

On 4 August 2016 at 12:40, Sandesh Hegde <sa...@datatorrent.com> wrote:

> Hello Team,
>
> This thread is to discuss the Named Checkpoint feature for Apex. (
> https://issues.apache.org/jira/browse/APEXCORE-498)
>
> Named checkpoints allow following workflow,
>
> 1. Users can trigger a checkpoint and give it a name
> 2. Relaunch the application from the named checkpoint.
> 3. These checkpoints survive the "purge of old checkpoints".
>
> Current idea is to add a new control tuple, NamedCheckPointTuple, which
> contains the user specified name, it traverses the DAG and along the way
> necessary actions are taken.
>
> Please let me know your thoughts on this.
>
> Thanks
>