You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Jim <ji...@facility.supplies> on 2016/02/18 23:20:44 UTC

Check pointing question

I have an interesting question, on a new Apex application that I am writing (note this is my first Apex application, so my wording of it may not be in all the correct terminology - please correct me if I have not stated something in the right way).

I am getting in data transmissions, via a streaming data pipe, that can each contain 1 or 500 individual transactions contained within them.

My current flow has this data coming into an input operator, which looks at the individual transactions contained with each transmission, and depending on transaction type, emits a tuple containing an individual transaction, and hands it off to an appropriate transaction specific operator for processing.

If the main input operator, while doing this splitting and emitting of transactions to the various transaction specific operators encounters a system problem where it suddenly stops processing, I have some questions:


1.)    Will each of the downstream operators continue to process what has already been emitted to them and continue to move them through the system?

2.)    If the answer to #1 above is yes, they will continue to process, how do I checkpoint what transaction point I am at within my application, so that when it restarts it will skip the xx records that it already handed off before the application went down, and resume processing at the record that was not yet emitted to any of the transaction specific operators for processing?

3.)    If the answer is no to #1 above, what does it do to all the tuples that are already in flow below the router level?

Thanks for any help and information that you can provide.

Thanks,

Jim

Re: Check pointing question

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Yes, each operator has their own checkpoint. Operators restarts from the
previous ( viable) checkpoint.

Please refer to following links for more information.

http://docs.datatorrent.com/application_development/#checkpointing
https://www.datatorrent.com/blog/blog-introduction-to-checkpoint/

On Thu, Feb 18, 2016 at 5:54 PM Jim <ji...@facility.supplies> wrote:

> Isha,
>
> Does each operator have their own checkpoint so as each finishes they have
> their own 'last' checkpoint?
>
> Jim
>
>
> On Feb 18, 2016, at 7:40 PM, Isha Arkatkar <is...@datatorrent.com> wrote:
>
> Hi Jim,
>
>     In the case when input operator was killed and then restarted, all the
> downstream operators are also restarted from previous available checkpoint.
>     So answer to first question is: No, the downstream operators do not
> continue processing.
>     And the tuples which were already in flow at the router level are
> actually not processed. As, those tuples would be sent again since the data
> is processed again from previous checkpoint.
>
> Thanks!
> Isha
>
> On Thu, Feb 18, 2016 at 2:20 PM, Jim <ji...@facility.supplies> wrote:
>
>> I have an interesting question, on a new Apex application that I am
>> writing (note this is my first Apex application, so my wording of it may
>> not be in all the correct terminology – please correct me if I have not
>> stated something in the right way).
>>
>>
>>
>> I am getting in data transmissions, via a streaming data pipe, that can
>> each contain 1 or 500 individual transactions contained within them.
>>
>>
>>
>> My current flow has this data coming into an input operator, which looks
>> at the individual transactions contained with each transmission, and
>> depending on transaction type, emits a tuple containing an individual
>> transaction, and hands it off to an appropriate transaction specific
>> operator for processing.
>>
>>
>>
>> If the main input operator, while doing this splitting and emitting of
>> transactions to the various transaction specific operators encounters a
>> system problem where it suddenly stops processing, I have some questions:
>>
>>
>>
>> 1.)    Will each of the downstream operators continue to process what
>> has already been emitted to them and continue to move them through the
>> system?
>>
>> 2.)    If the answer to #1 above is yes, they will continue to process,
>> how do I checkpoint what transaction point I am at within my application,
>> so that when it restarts it will skip the xx records that it already handed
>> off before the application went down, and resume processing at the record
>> that was not yet emitted to any of the transaction specific operators for
>> processing?
>>
>> 3.)    If the answer is no to #1 above, what does it do to all the
>> tuples that are already in flow below the router level?
>>
>>
>>
>> Thanks for any help and information that you can provide.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Jim
>>
>
>

Re: Check pointing question

Posted by Jim <ji...@facility.supplies>.
Isha,

Does each operator have their own checkpoint so as each finishes they have their own 'last' checkpoint?

Jim


On Feb 18, 2016, at 7:40 PM, Isha Arkatkar <is...@datatorrent.com>> wrote:

Hi Jim,

    In the case when input operator was killed and then restarted, all the downstream operators are also restarted from previous available checkpoint.
    So answer to first question is: No, the downstream operators do not continue processing.
    And the tuples which were already in flow at the router level are actually not processed. As, those tuples would be sent again since the data is processed again from previous checkpoint.

Thanks!
Isha

On Thu, Feb 18, 2016 at 2:20 PM, Jim <ji...@facility.supplies>> wrote:
I have an interesting question, on a new Apex application that I am writing (note this is my first Apex application, so my wording of it may not be in all the correct terminology - please correct me if I have not stated something in the right way).

I am getting in data transmissions, via a streaming data pipe, that can each contain 1 or 500 individual transactions contained within them.

My current flow has this data coming into an input operator, which looks at the individual transactions contained with each transmission, and depending on transaction type, emits a tuple containing an individual transaction, and hands it off to an appropriate transaction specific operator for processing.

If the main input operator, while doing this splitting and emitting of transactions to the various transaction specific operators encounters a system problem where it suddenly stops processing, I have some questions:


1.)    Will each of the downstream operators continue to process what has already been emitted to them and continue to move them through the system?

2.)    If the answer to #1 above is yes, they will continue to process, how do I checkpoint what transaction point I am at within my application, so that when it restarts it will skip the xx records that it already handed off before the application went down, and resume processing at the record that was not yet emitted to any of the transaction specific operators for processing?

3.)    If the answer is no to #1 above, what does it do to all the tuples that are already in flow below the router level?

Thanks for any help and information that you can provide.

Thanks,

Jim


Re: Check pointing question

Posted by Isha Arkatkar <is...@datatorrent.com>.
Hi Jim,

    In the case when input operator was killed and then restarted, all the
downstream operators are also restarted from previous available checkpoint.
    So answer to first question is: No, the downstream operators do not
continue processing.
    And the tuples which were already in flow at the router level are
actually not processed. As, those tuples would be sent again since the data
is processed again from previous checkpoint.

Thanks!
Isha

On Thu, Feb 18, 2016 at 2:20 PM, Jim <ji...@facility.supplies> wrote:

> I have an interesting question, on a new Apex application that I am
> writing (note this is my first Apex application, so my wording of it may
> not be in all the correct terminology – please correct me if I have not
> stated something in the right way).
>
>
>
> I am getting in data transmissions, via a streaming data pipe, that can
> each contain 1 or 500 individual transactions contained within them.
>
>
>
> My current flow has this data coming into an input operator, which looks
> at the individual transactions contained with each transmission, and
> depending on transaction type, emits a tuple containing an individual
> transaction, and hands it off to an appropriate transaction specific
> operator for processing.
>
>
>
> If the main input operator, while doing this splitting and emitting of
> transactions to the various transaction specific operators encounters a
> system problem where it suddenly stops processing, I have some questions:
>
>
>
> 1.)    Will each of the downstream operators continue to process what has
> already been emitted to them and continue to move them through the system?
>
> 2.)    If the answer to #1 above is yes, they will continue to process,
> how do I checkpoint what transaction point I am at within my application,
> so that when it restarts it will skip the xx records that it already handed
> off before the application went down, and resume processing at the record
> that was not yet emitted to any of the transaction specific operators for
> processing?
>
> 3.)    If the answer is no to #1 above, what does it do to all the tuples
> that are already in flow below the router level?
>
>
>
> Thanks for any help and information that you can provide.
>
>
>
> Thanks,
>
>
>
> Jim
>