You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by Pramod Immaneni <pr...@datatorrent.com> on 2015/09/28 19:17:03 UTC

dynamic application properties proposal

Apex support modification of operator properties at runtime but the current
implemenations has the following shortcomings.

1. Property is not set across all partitions on the same window as
individual partitions can be on different windows when property change is
initiated from client resulting in inconsistency of data for those windows.
I am being generous using the word inconsistent.
2. Sometimes properties need to be set on more than one logical operators
at the same time to achieve the change the user is seeking. Today they will
be two separate changes happening on two different windows again resulting
in inconsistent data for some windows. These would need to happen as a
single transaction.
3. If there is an operator failure before a committed checkpoint after an
operator property is dynamically changed the operator will restart with the
old property and the change will not be re-applied.

Tim and myself did some brainstorming and we have a proposal to overcome
these shortcomings. The main problem in all the above cases is that the
property changes are happening out-of-band of data flow and hence
independent of windowing. The proposal is to bring the property change
request into the in-band dataflow so that they are handled consistently
with windowing and handled distributively.

The idea is to inject a special property change tuple containing the
property changes and the identification information of the operator's they
affect into the dataflow at the input operator. The tuple will be injected
at window boundary after end window and before begin window and as this
tuple flows through the DAG the intended operators properties will be
modifed. They will all be modified consistently at the same window. The
tuple can contain more than one property changes for more than one logical
operators and the change will be applied consistently to the different
logical operators at the same window. In case of failure the replay of
tuples will ensure that the property change gets reapplied at the correct
window.

Please give your feedback and input on what you think about this proposal.

Thanks

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Here is the JIRA to track this

https://malhar.atlassian.net/browse/APEX-163

On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Apex support modification of operator properties at runtime but the
> current implemenations has the following shortcomings.
>
> 1. Property is not set across all partitions on the same window as
> individual partitions can be on different windows when property change is
> initiated from client resulting in inconsistency of data for those windows.
> I am being generous using the word inconsistent.
> 2. Sometimes properties need to be set on more than one logical operators
> at the same time to achieve the change the user is seeking. Today they will
> be two separate changes happening on two different windows again resulting
> in inconsistent data for some windows. These would need to happen as a
> single transaction.
> 3. If there is an operator failure before a committed checkpoint after an
> operator property is dynamically changed the operator will restart with the
> old property and the change will not be re-applied.
>
> Tim and myself did some brainstorming and we have a proposal to overcome
> these shortcomings. The main problem in all the above cases is that the
> property changes are happening out-of-band of data flow and hence
> independent of windowing. The proposal is to bring the property change
> request into the in-band dataflow so that they are handled consistently
> with windowing and handled distributively.
>
> The idea is to inject a special property change tuple containing the
> property changes and the identification information of the operator's they
> affect into the dataflow at the input operator. The tuple will be injected
> at window boundary after end window and before begin window and as this
> tuple flows through the DAG the intended operators properties will be
> modifed. They will all be modified consistently at the same window. The
> tuple can contain more than one property changes for more than one logical
> operators and the change will be applied consistently to the different
> logical operators at the same window. In case of failure the replay of
> tuples will ensure that the property change gets reapplied at the correct
> window.
>
> Please give your feedback and input on what you think about this proposal.
>
> Thanks
>

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

The operators could have moved to window beyond the window you are
targetting by the time the property change request is received and
processed. What happens when there are different logical operators whose
properties need to be changed together. How would you handle replay on
recovery. All this would mean you would start managing window state of
operators in stram which is not needed.

On Mon, Sep 28, 2015 at 11:09 AM, Gaurav Gupta <ga...@datatorrent.com>
wrote:

> Pramod,
>
> Here is what I was thinking that currently the property value change
> happens at the boundary windows..Stram sends these Operator Requests to
> individual instances.  If there are multiple instances of same operator and
> there is a property change request on this operator, send the Operator
> Request change to the instance that is farthest and wait for other
> instances to come to that window id before sending the Operator Request to
> them.. By this you don’t need additional special tuple?
>
> Does it make sense?
>
> Thanks
> - Gaurav
>
> > On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
> >
> > If OperatorRequest is out-of-band to dataflow which I think it is then
> that
> > would most probably not be the mechanism to relay property change. We
> would
> > possibly expose this proposed property change in an API that could be
> used
> > by StatsListener.
> >
> > On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <ga...@datatorrent.com>
> > wrote:
> >
> >> Pramod,
> >>
> >> How would dynamic property change using OperatorRequest as part of
> >> StatsListener work with new approach?
> >>
> >> Thanks
> >> - Gaurav
> >>
> >>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <pr...@datatorrent.com>
> >> wrote:
> >>>
> >>> An optimization that can be done is the below steps are done only when
> >>> there only when there are more than one input operator but in case of a
> >>> single input operator case which is more common the property change
> tuple
> >>> can be inserted at the next possible window without having to
> temporarily
> >>> pause the flow.
> >>>
> >>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <ti...@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Furthermore this approach is not limited to DAGs with a single input
> >>>> operator. In the case where a DAG has multiple input operators
> property
> >>>> changes can be set within the same window across all input operators
> by
> >>>> enforcing some synchronization at the input operator level when
> setting
> >> the
> >>>> property. This synchronization would look like the following:
> >>>>
> >>>>  1. When receiving a property change request, ask all input operators
> >> to
> >>>> stop and send their current window.
> >>>>  2. Take the max window + 1 (not technically correct but you get the
> >>>> idea)
> >>>>  3. Send the property change request to all the input operators and
> >> tell
> >>>> them to apply the change at the maximum window id + 1.
> >>>>  4. Resume the input operators.
> >>>>
> >>>> This ensures that the change is applied at the same window Id and also
> >>>> ensures that the change is applied at a window ID that the input
> >> operator
> >>>> had never played before. Therefore property changes will not interfere
> >> with
> >>>> the idempotence of operators.
> >>>>
> >>>>
> >>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <
> >> pramod@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>>> Apex support modification of operator properties at runtime but the
> >>>>> current implemenations has the following shortcomings.
> >>>>>
> >>>>> 1. Property is not set across all partitions on the same window as
> >>>>> individual partitions can be on different windows when property
> change
> >> is
> >>>>> initiated from client resulting in inconsistency of data for those
> >> windows.
> >>>>> I am being generous using the word inconsistent.
> >>>>> 2. Sometimes properties need to be set on more than one logical
> >> operators
> >>>>> at the same time to achieve the change the user is seeking. Today
> they
> >> will
> >>>>> be two separate changes happening on two different windows again
> >> resulting
> >>>>> in inconsistent data for some windows. These would need to happen as
> a
> >>>>> single transaction.
> >>>>> 3. If there is an operator failure before a committed checkpoint
> after
> >> an
> >>>>> operator property is dynamically changed the operator will restart
> >> with the
> >>>>> old property and the change will not be re-applied.
> >>>>>
> >>>>> Tim and myself did some brainstorming and we have a proposal to
> >> overcome
> >>>>> these shortcomings. The main problem in all the above cases is that
> the
> >>>>> property changes are happening out-of-band of data flow and hence
> >>>>> independent of windowing. The proposal is to bring the property
> change
> >>>>> request into the in-band dataflow so that they are handled
> consistently
> >>>>> with windowing and handled distributively.
> >>>>>
> >>>>> The idea is to inject a special property change tuple containing the
> >>>>> property changes and the identification information of the operator's
> >> they
> >>>>> affect into the dataflow at the input operator. The tuple will be
> >> injected
> >>>>> at window boundary after end window and before begin window and as
> this
> >>>>> tuple flows through the DAG the intended operators properties will be
> >>>>> modifed. They will all be modified consistently at the same window.
> The
> >>>>> tuple can contain more than one property changes for more than one
> >> logical
> >>>>> operators and the change will be applied consistently to the
> different
> >>>>> logical operators at the same window. In case of failure the replay
> of
> >>>>> tuples will ensure that the property change gets reapplied at the
> >> correct
> >>>>> window.
> >>>>>
> >>>>> Please give your feedback and input on what you think about this
> >> proposal.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: dynamic application properties proposal

Posted by Gaurav Gupta <ga...@datatorrent.com>.

I think this could solve the dynamic property change via StatsListener as well

Thanks
- Gaurav

> On Sep 28, 2015, at 11:09 AM, Gaurav Gupta <ga...@datatorrent.com> wrote:
> 
> Pramod,
> 
> Here is what I was thinking that currently the property value change happens at the boundary windows..Stram sends these Operator Requests to individual instances.  If there are multiple instances of same operator and there is a property change request on this operator, send the Operator Request change to the instance that is farthest and wait for other instances to come to that window id before sending the Operator Request to them.. By this you don’t need additional special tuple?
> 
> Does it make sense?
> 
> Thanks
> - Gaurav
> 
>> On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <pramod@datatorrent.com <ma...@datatorrent.com>> wrote:
>> 
>> If OperatorRequest is out-of-band to dataflow which I think it is then that
>> would most probably not be the mechanism to relay property change. We would
>> possibly expose this proposed property change in an API that could be used
>> by StatsListener.
>> 
>> On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <gaurav@datatorrent.com <ma...@datatorrent.com>>
>> wrote:
>> 
>>> Pramod,
>>> 
>>> How would dynamic property change using OperatorRequest as part of
>>> StatsListener work with new approach?
>>> 
>>> Thanks
>>> - Gaurav
>>> 
>>>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <pramod@datatorrent.com <ma...@datatorrent.com>>
>>> wrote:
>>>> 
>>>> An optimization that can be done is the below steps are done only when
>>>> there only when there are more than one input operator but in case of a
>>>> single input operator case which is more common the property change tuple
>>>> can be inserted at the next possible window without having to temporarily
>>>> pause the flow.
>>>> 
>>>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <tim@datatorrent.com <ma...@datatorrent.com>>
>>>> wrote:
>>>> 
>>>>> Furthermore this approach is not limited to DAGs with a single input
>>>>> operator. In the case where a DAG has multiple input operators property
>>>>> changes can be set within the same window across all input operators by
>>>>> enforcing some synchronization at the input operator level when setting
>>> the
>>>>> property. This synchronization would look like the following:
>>>>> 
>>>>>  1. When receiving a property change request, ask all input operators
>>> to
>>>>> stop and send their current window.
>>>>>  2. Take the max window + 1 (not technically correct but you get the
>>>>> idea)
>>>>>  3. Send the property change request to all the input operators and
>>> tell
>>>>> them to apply the change at the maximum window id + 1.
>>>>>  4. Resume the input operators.
>>>>> 
>>>>> This ensures that the change is applied at the same window Id and also
>>>>> ensures that the change is applied at a window ID that the input
>>> operator
>>>>> had never played before. Therefore property changes will not interfere
>>> with
>>>>> the idempotence of operators.
>>>>> 
>>>>> 
>>>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <
>>> pramod@datatorrent.com <ma...@datatorrent.com>>
>>>>> wrote:
>>>>> 
>>>>>> Apex support modification of operator properties at runtime but the
>>>>>> current implemenations has the following shortcomings.
>>>>>> 
>>>>>> 1. Property is not set across all partitions on the same window as
>>>>>> individual partitions can be on different windows when property change
>>> is
>>>>>> initiated from client resulting in inconsistency of data for those
>>> windows.
>>>>>> I am being generous using the word inconsistent.
>>>>>> 2. Sometimes properties need to be set on more than one logical
>>> operators
>>>>>> at the same time to achieve the change the user is seeking. Today they
>>> will
>>>>>> be two separate changes happening on two different windows again
>>> resulting
>>>>>> in inconsistent data for some windows. These would need to happen as a
>>>>>> single transaction.
>>>>>> 3. If there is an operator failure before a committed checkpoint after
>>> an
>>>>>> operator property is dynamically changed the operator will restart
>>> with the
>>>>>> old property and the change will not be re-applied.
>>>>>> 
>>>>>> Tim and myself did some brainstorming and we have a proposal to
>>> overcome
>>>>>> these shortcomings. The main problem in all the above cases is that the
>>>>>> property changes are happening out-of-band of data flow and hence
>>>>>> independent of windowing. The proposal is to bring the property change
>>>>>> request into the in-band dataflow so that they are handled consistently
>>>>>> with windowing and handled distributively.
>>>>>> 
>>>>>> The idea is to inject a special property change tuple containing the
>>>>>> property changes and the identification information of the operator's
>>> they
>>>>>> affect into the dataflow at the input operator. The tuple will be
>>> injected
>>>>>> at window boundary after end window and before begin window and as this
>>>>>> tuple flows through the DAG the intended operators properties will be
>>>>>> modifed. They will all be modified consistently at the same window. The
>>>>>> tuple can contain more than one property changes for more than one
>>> logical
>>>>>> operators and the change will be applied consistently to the different
>>>>>> logical operators at the same window. In case of failure the replay of
>>>>>> tuples will ensure that the property change gets reapplied at the
>>> correct
>>>>>> window.
>>>>>> 
>>>>>> Please give your feedback and input on what you think about this
>>> proposal.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: dynamic application properties proposal

Posted by Gaurav Gupta <ga...@datatorrent.com>.

Pramod,

Here is what I was thinking that currently the property value change happens at the boundary windows..Stram sends these Operator Requests to individual instances.  If there are multiple instances of same operator and there is a property change request on this operator, send the Operator Request change to the instance that is farthest and wait for other instances to come to that window id before sending the Operator Request to them.. By this you don’t need additional special tuple?

Does it make sense?

Thanks
- Gaurav

> On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <pr...@datatorrent.com> wrote:
> 
> If OperatorRequest is out-of-band to dataflow which I think it is then that
> would most probably not be the mechanism to relay property change. We would
> possibly expose this proposed property change in an API that could be used
> by StatsListener.
> 
> On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <ga...@datatorrent.com>
> wrote:
> 
>> Pramod,
>> 
>> How would dynamic property change using OperatorRequest as part of
>> StatsListener work with new approach?
>> 
>> Thanks
>> - Gaurav
>> 
>>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>> 
>>> An optimization that can be done is the below steps are done only when
>>> there only when there are more than one input operator but in case of a
>>> single input operator case which is more common the property change tuple
>>> can be inserted at the next possible window without having to temporarily
>>> pause the flow.
>>> 
>>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <ti...@datatorrent.com>
>>> wrote:
>>> 
>>>> Furthermore this approach is not limited to DAGs with a single input
>>>> operator. In the case where a DAG has multiple input operators property
>>>> changes can be set within the same window across all input operators by
>>>> enforcing some synchronization at the input operator level when setting
>> the
>>>> property. This synchronization would look like the following:
>>>> 
>>>>  1. When receiving a property change request, ask all input operators
>> to
>>>> stop and send their current window.
>>>>  2. Take the max window + 1 (not technically correct but you get the
>>>> idea)
>>>>  3. Send the property change request to all the input operators and
>> tell
>>>> them to apply the change at the maximum window id + 1.
>>>>  4. Resume the input operators.
>>>> 
>>>> This ensures that the change is applied at the same window Id and also
>>>> ensures that the change is applied at a window ID that the input
>> operator
>>>> had never played before. Therefore property changes will not interfere
>> with
>>>> the idempotence of operators.
>>>> 
>>>> 
>>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <
>> pramod@datatorrent.com>
>>>> wrote:
>>>> 
>>>>> Apex support modification of operator properties at runtime but the
>>>>> current implemenations has the following shortcomings.
>>>>> 
>>>>> 1. Property is not set across all partitions on the same window as
>>>>> individual partitions can be on different windows when property change
>> is
>>>>> initiated from client resulting in inconsistency of data for those
>> windows.
>>>>> I am being generous using the word inconsistent.
>>>>> 2. Sometimes properties need to be set on more than one logical
>> operators
>>>>> at the same time to achieve the change the user is seeking. Today they
>> will
>>>>> be two separate changes happening on two different windows again
>> resulting
>>>>> in inconsistent data for some windows. These would need to happen as a
>>>>> single transaction.
>>>>> 3. If there is an operator failure before a committed checkpoint after
>> an
>>>>> operator property is dynamically changed the operator will restart
>> with the
>>>>> old property and the change will not be re-applied.
>>>>> 
>>>>> Tim and myself did some brainstorming and we have a proposal to
>> overcome
>>>>> these shortcomings. The main problem in all the above cases is that the
>>>>> property changes are happening out-of-band of data flow and hence
>>>>> independent of windowing. The proposal is to bring the property change
>>>>> request into the in-band dataflow so that they are handled consistently
>>>>> with windowing and handled distributively.
>>>>> 
>>>>> The idea is to inject a special property change tuple containing the
>>>>> property changes and the identification information of the operator's
>> they
>>>>> affect into the dataflow at the input operator. The tuple will be
>> injected
>>>>> at window boundary after end window and before begin window and as this
>>>>> tuple flows through the DAG the intended operators properties will be
>>>>> modifed. They will all be modified consistently at the same window. The
>>>>> tuple can contain more than one property changes for more than one
>> logical
>>>>> operators and the change will be applied consistently to the different
>>>>> logical operators at the same window. In case of failure the replay of
>>>>> tuples will ensure that the property change gets reapplied at the
>> correct
>>>>> window.
>>>>> 
>>>>> Please give your feedback and input on what you think about this
>> proposal.
>>>>> 
>>>>> Thanks
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

If OperatorRequest is out-of-band to dataflow which I think it is then that
would most probably not be the mechanism to relay property change. We would
possibly expose this proposed property change in an API that could be used
by StatsListener.

On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <ga...@datatorrent.com>
wrote:

> Pramod,
>
> How would dynamic property change using OperatorRequest as part of
> StatsListener work with new approach?
>
> Thanks
> - Gaurav
>
> > On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
> >
> > An optimization that can be done is the below steps are done only when
> > there only when there are more than one input operator but in case of a
> > single input operator case which is more common the property change tuple
> > can be inserted at the next possible window without having to temporarily
> > pause the flow.
> >
> > On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <ti...@datatorrent.com>
> > wrote:
> >
> >> Furthermore this approach is not limited to DAGs with a single input
> >> operator. In the case where a DAG has multiple input operators property
> >> changes can be set within the same window across all input operators by
> >> enforcing some synchronization at the input operator level when setting
> the
> >> property. This synchronization would look like the following:
> >>
> >>   1. When receiving a property change request, ask all input operators
> to
> >> stop and send their current window.
> >>   2. Take the max window + 1 (not technically correct but you get the
> >> idea)
> >>   3. Send the property change request to all the input operators and
> tell
> >> them to apply the change at the maximum window id + 1.
> >>   4. Resume the input operators.
> >>
> >> This ensures that the change is applied at the same window Id and also
> >> ensures that the change is applied at a window ID that the input
> operator
> >> had never played before. Therefore property changes will not interfere
> with
> >> the idempotence of operators.
> >>
> >>
> >> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <
> pramod@datatorrent.com>
> >> wrote:
> >>
> >>> Apex support modification of operator properties at runtime but the
> >>> current implemenations has the following shortcomings.
> >>>
> >>> 1. Property is not set across all partitions on the same window as
> >>> individual partitions can be on different windows when property change
> is
> >>> initiated from client resulting in inconsistency of data for those
> windows.
> >>> I am being generous using the word inconsistent.
> >>> 2. Sometimes properties need to be set on more than one logical
> operators
> >>> at the same time to achieve the change the user is seeking. Today they
> will
> >>> be two separate changes happening on two different windows again
> resulting
> >>> in inconsistent data for some windows. These would need to happen as a
> >>> single transaction.
> >>> 3. If there is an operator failure before a committed checkpoint after
> an
> >>> operator property is dynamically changed the operator will restart
> with the
> >>> old property and the change will not be re-applied.
> >>>
> >>> Tim and myself did some brainstorming and we have a proposal to
> overcome
> >>> these shortcomings. The main problem in all the above cases is that the
> >>> property changes are happening out-of-band of data flow and hence
> >>> independent of windowing. The proposal is to bring the property change
> >>> request into the in-band dataflow so that they are handled consistently
> >>> with windowing and handled distributively.
> >>>
> >>> The idea is to inject a special property change tuple containing the
> >>> property changes and the identification information of the operator's
> they
> >>> affect into the dataflow at the input operator. The tuple will be
> injected
> >>> at window boundary after end window and before begin window and as this
> >>> tuple flows through the DAG the intended operators properties will be
> >>> modifed. They will all be modified consistently at the same window. The
> >>> tuple can contain more than one property changes for more than one
> logical
> >>> operators and the change will be applied consistently to the different
> >>> logical operators at the same window. In case of failure the replay of
> >>> tuples will ensure that the property change gets reapplied at the
> correct
> >>> window.
> >>>
> >>> Please give your feedback and input on what you think about this
> proposal.
> >>>
> >>> Thanks
> >>>
> >>
> >>
>
>

Re: dynamic application properties proposal

Posted by Gaurav Gupta <ga...@datatorrent.com>.

Pramod,

How would dynamic property change using OperatorRequest as part of StatsListener work with new approach?

Thanks
- Gaurav

> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <pr...@datatorrent.com> wrote:
> 
> An optimization that can be done is the below steps are done only when
> there only when there are more than one input operator but in case of a
> single input operator case which is more common the property change tuple
> can be inserted at the next possible window without having to temporarily
> pause the flow.
> 
> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
> 
>> Furthermore this approach is not limited to DAGs with a single input
>> operator. In the case where a DAG has multiple input operators property
>> changes can be set within the same window across all input operators by
>> enforcing some synchronization at the input operator level when setting the
>> property. This synchronization would look like the following:
>> 
>>   1. When receiving a property change request, ask all input operators to
>> stop and send their current window.
>>   2. Take the max window + 1 (not technically correct but you get the
>> idea)
>>   3. Send the property change request to all the input operators and tell
>> them to apply the change at the maximum window id + 1.
>>   4. Resume the input operators.
>> 
>> This ensures that the change is applied at the same window Id and also
>> ensures that the change is applied at a window ID that the input operator
>> had never played before. Therefore property changes will not interfere with
>> the idempotence of operators.
>> 
>> 
>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>> 
>>> Apex support modification of operator properties at runtime but the
>>> current implemenations has the following shortcomings.
>>> 
>>> 1. Property is not set across all partitions on the same window as
>>> individual partitions can be on different windows when property change is
>>> initiated from client resulting in inconsistency of data for those windows.
>>> I am being generous using the word inconsistent.
>>> 2. Sometimes properties need to be set on more than one logical operators
>>> at the same time to achieve the change the user is seeking. Today they will
>>> be two separate changes happening on two different windows again resulting
>>> in inconsistent data for some windows. These would need to happen as a
>>> single transaction.
>>> 3. If there is an operator failure before a committed checkpoint after an
>>> operator property is dynamically changed the operator will restart with the
>>> old property and the change will not be re-applied.
>>> 
>>> Tim and myself did some brainstorming and we have a proposal to overcome
>>> these shortcomings. The main problem in all the above cases is that the
>>> property changes are happening out-of-band of data flow and hence
>>> independent of windowing. The proposal is to bring the property change
>>> request into the in-band dataflow so that they are handled consistently
>>> with windowing and handled distributively.
>>> 
>>> The idea is to inject a special property change tuple containing the
>>> property changes and the identification information of the operator's they
>>> affect into the dataflow at the input operator. The tuple will be injected
>>> at window boundary after end window and before begin window and as this
>>> tuple flows through the DAG the intended operators properties will be
>>> modifed. They will all be modified consistently at the same window. The
>>> tuple can contain more than one property changes for more than one logical
>>> operators and the change will be applied consistently to the different
>>> logical operators at the same window. In case of failure the replay of
>>> tuples will ensure that the property change gets reapplied at the correct
>>> window.
>>> 
>>> Please give your feedback and input on what you think about this proposal.
>>> 
>>> Thanks
>>> 
>> 
>>

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

An optimization that can be done is the below steps are done only when
there only when there are more than one input operator but in case of a
single input operator case which is more common the property change tuple
can be inserted at the next possible window without having to temporarily
pause the flow.

On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <ti...@datatorrent.com>
wrote:

> Furthermore this approach is not limited to DAGs with a single input
> operator. In the case where a DAG has multiple input operators property
> changes can be set within the same window across all input operators by
> enforcing some synchronization at the input operator level when setting the
> property. This synchronization would look like the following:
>
>    1. When receiving a property change request, ask all input operators to
> stop and send their current window.
>    2. Take the max window + 1 (not technically correct but you get the
> idea)
>    3. Send the property change request to all the input operators and tell
> them to apply the change at the maximum window id + 1.
>    4. Resume the input operators.
>
> This ensures that the change is applied at the same window Id and also
> ensures that the change is applied at a window ID that the input operator
> had never played before. Therefore property changes will not interfere with
> the idempotence of operators.
>
>
> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> Apex support modification of operator properties at runtime but the
>> current implemenations has the following shortcomings.
>>
>> 1. Property is not set across all partitions on the same window as
>> individual partitions can be on different windows when property change is
>> initiated from client resulting in inconsistency of data for those windows.
>> I am being generous using the word inconsistent.
>> 2. Sometimes properties need to be set on more than one logical operators
>> at the same time to achieve the change the user is seeking. Today they will
>> be two separate changes happening on two different windows again resulting
>> in inconsistent data for some windows. These would need to happen as a
>> single transaction.
>> 3. If there is an operator failure before a committed checkpoint after an
>> operator property is dynamically changed the operator will restart with the
>> old property and the change will not be re-applied.
>>
>> Tim and myself did some brainstorming and we have a proposal to overcome
>> these shortcomings. The main problem in all the above cases is that the
>> property changes are happening out-of-band of data flow and hence
>> independent of windowing. The proposal is to bring the property change
>> request into the in-band dataflow so that they are handled consistently
>> with windowing and handled distributively.
>>
>> The idea is to inject a special property change tuple containing the
>> property changes and the identification information of the operator's they
>> affect into the dataflow at the input operator. The tuple will be injected
>> at window boundary after end window and before begin window and as this
>> tuple flows through the DAG the intended operators properties will be
>> modifed. They will all be modified consistently at the same window. The
>> tuple can contain more than one property changes for more than one logical
>> operators and the change will be applied consistently to the different
>> logical operators at the same window. In case of failure the replay of
>> tuples will ensure that the property change gets reapplied at the correct
>> window.
>>
>> Please give your feedback and input on what you think about this proposal.
>>
>> Thanks
>>
>
>

Re: dynamic application properties proposal

Posted by Timothy Farkas <ti...@datatorrent.com>.

Furthermore this approach is not limited to DAGs with a single input
operator. In the case where a DAG has multiple input operators property
changes can be set within the same window across all input operators by
enforcing some synchronization at the input operator level when setting the
property. This synchronization would look like the following:

   1. When receiving a property change request, ask all input operators to
stop and send their current window.
   2. Take the max window + 1 (not technically correct but you get the idea)
   3. Send the property change request to all the input operators and tell
them to apply the change at the maximum window id + 1.
   4. Resume the input operators.

This ensures that the change is applied at the same window Id and also
ensures that the change is applied at a window ID that the input operator
had never played before. Therefore property changes will not interfere with
the idempotence of operators.

On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Apex support modification of operator properties at runtime but the
> current implemenations has the following shortcomings.
>
> 1. Property is not set across all partitions on the same window as
> individual partitions can be on different windows when property change is
> initiated from client resulting in inconsistency of data for those windows.
> I am being generous using the word inconsistent.
> 2. Sometimes properties need to be set on more than one logical operators
> at the same time to achieve the change the user is seeking. Today they will
> be two separate changes happening on two different windows again resulting
> in inconsistent data for some windows. These would need to happen as a
> single transaction.
> 3. If there is an operator failure before a committed checkpoint after an
> operator property is dynamically changed the operator will restart with the
> old property and the change will not be re-applied.
>
> Tim and myself did some brainstorming and we have a proposal to overcome
> these shortcomings. The main problem in all the above cases is that the
> property changes are happening out-of-band of data flow and hence
> independent of windowing. The proposal is to bring the property change
> request into the in-band dataflow so that they are handled consistently
> with windowing and handled distributively.
>
> The idea is to inject a special property change tuple containing the
> property changes and the identification information of the operator's they
> affect into the dataflow at the input operator. The tuple will be injected
> at window boundary after end window and before begin window and as this
> tuple flows through the DAG the intended operators properties will be
> modifed. They will all be modified consistently at the same window. The
> tuple can contain more than one property changes for more than one logical
> operators and the change will be applied consistently to the different
> logical operators at the same window. In case of failure the replay of
> tuples will ensure that the property change gets reapplied at the correct
> window.
>
> Please give your feedback and input on what you think about this proposal.
>
> Thanks
>

Re: dynamic application properties proposal

Posted by Amol Kekre <am...@datatorrent.com>.

This works well. How about not waiting for go ahead from Master, just have
the input operators be given a go ahead. This way the success rate goes up
a lot. On con side the control tuple traverse the entire DAG.

Thks,
Amol


On Wed, Oct 7, 2015 at 2:48 PM, Timothy Farkas <ti...@datatorrent.com> wrote:

> I think we could achieve a 100% gaurantee without (unnecessarily) pausing
> operators. This could be achieved by making a small addition to the above
> approach.
>
> 1.) pick a window N windows ahead of the current max window of the
> operators. Let's call this window W
> 2.) Send a property change request to the operators to change the property
> on window W
> 3.) As part of the property change request the operator will do one of two
> things:
>      a Reply with a failure if it has passed window W.
>      b Reply with a success if it has not already passed window W.
> 4.) Operators which replied with a success will asynchronously wait for a
> confirmation message to apply the property. If the operator reaches window
> W before it receives the confirmation, the operator will block until a
> confirmation is received.
> 5.) Meanwhile the app master collects the responses to the property change
> requests. If all the property change requests responded with a success,
> then a confirmation message is sent to all the operators to apply the
> property. If one or more of the operators replied with failure, then a
> property change cancellation is sent to all the operators, and then the
> whole process is retried.
>
> This way 99.99% of the time a property change would be applied without
> pausing operators. Operators will only be paused on rare ocassions, and
> only for the sake of preventing application errors that could be triggered
> by an incorrect application of a property.
>
> Thanks,
> Tim
>
>
>
> On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <am...@datatorrent.com> wrote:
>
> > Pause is hard to pull off. It has a lot of other side effect/consequences
> > on scale and on external systems that now have to back up. As number of
> > operators grows the algorithm halts more. Data-in-motion will means that
> > backlog will build up during pause, specially within external systems.
> The
> > problem occurs even if we have one logical input operator with N
> > partitions.
> >
> > A much quicker way, though not with a technical guarantee will be to let
> > users decide a window id increment in the future. The command may then be
> > "let me set properties on these operators N window in the future off the
> > current max window id amoung them". A user can then use a high enough N
> to
> > get 99.99% certainty that the window is aligned.
> >
> > Thks,
> > Amol
> >
> >
> > On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <ti...@datatorrent.com>
> > wrote:
> >
> > > The case where there is no common ancestor also has to be handled. For
> > > example you may need to change a property on two different input
> > operators.
> > > In this case the property needs to be set on both operators before the
> > same
> > > window. This also needs to be done the first time a window is computed
> by
> > > an input operator, otherwise there would be issues with idempotence.
> This
> > > could be achieved by doing the following
> > >
> > > 1. input operators would have to be paused when setting a property.
> > > 2. They would report their window id.
> > > 3. Then the max window Id needs to be picked
> > > 4. Then the property needs to be scheduled to be set at the appropriate
> > > window.
> > > 5. Then the input operators are resumed.
> > >
> > > Thanks,
> > > Tim
> > > On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:
> > >
> > > >
> > > > The issue comes up when property has to be changed in multiple
> > operators,
> > > > logical or physical. Since it does not matter if this is triggered by
> > an
> > > > input adapter or any parent of this operators, stram can pick common
> > > > ancestor. Property change commands (operator id, prop name, prop val)
> > can
> > > > be inserted by the stramchild of the common ancestor.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com>
> > > wrote:
> > > > >
> > > > > Pramod,
> > > > >
> > > > > The new special property change tuple will be send to all the
> > Operators
> > > > and all the operators will have to check if the property change is
> > > > applicable for it. Although such requests may be very few, but is
> > there a
> > > > way to optimize it?
> > > > >
> > > > > Thanks
> > > > > - Gaurav
> > > > >
> > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > > > wrote:
> > > > >>
> > > > >> At the platform level that cannot be guaranteed as your operator
> > > > controls
> > > > >> and manages reading of the data. However it is not difficult to
> > > envision
> > > > >> writing an operator that would pick up a new dataset when property
> > is
> > > > >> changed.
> > > > >>
> > > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > > >> ashwinchandrap@gmail.com> wrote:
> > > > >>
> > > > >>> Great, looking forward to these changes. Does it also provide a
> > > > guarantee
> > > > >>> on which properties are used for which input data sets?
> > > > >>>
> > > > >>> Few use case examples:
> > > > >>> - set property between reads of different batches of files. Say,
> > > > applying
> > > > >>> batch name property before processing the next batch of files.
> > > > >>> - load new configuration file for csv parser before processing
> next
> > > > set of
> > > > >>> data.
> > > > >>> - apply new regex before parsing next stream of tuples.
> > > > >>> etc.
> > > > >>>
> > > > >>> One approach to allow this is to emit subsequent tuples only
> > starting
> > > > next
> > > > >>> window after the window in which property change is made. That
> way,
> > > the
> > > > >>> boundaries between data sets is fixed and property change is done
> > in
> > > > >>> between. The user will now have a guarantee on which property
> value
> > > is
> > > > used
> > > > >>> on any given tuple.
> > > > >>>
> > > > >>> Thoughts?
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >>>
> > > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > > pramod@datatorrent.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Apex support modification of operator properties at runtime but
> > the
> > > > >>> current
> > > > >>>> implemenations has the following shortcomings.
> > > > >>>>
> > > > >>>> 1. Property is not set across all partitions on the same window
> as
> > > > >>>> individual partitions can be on different windows when property
> > > > change is
> > > > >>>> initiated from client resulting in inconsistency of data for
> those
> > > > >>> windows.
> > > > >>>> I am being generous using the word inconsistent.
> > > > >>>> 2. Sometimes properties need to be set on more than one logical
> > > > operators
> > > > >>>> at the same time to achieve the change the user is seeking.
> Today
> > > they
> > > > >>> will
> > > > >>>> be two separate changes happening on two different windows again
> > > > >>> resulting
> > > > >>>> in inconsistent data for some windows. These would need to
> happen
> > > as a
> > > > >>>> single transaction.
> > > > >>>> 3. If there is an operator failure before a committed checkpoint
> > > > after an
> > > > >>>> operator property is dynamically changed the operator will
> restart
> > > > with
> > > > >>> the
> > > > >>>> old property and the change will not be re-applied.
> > > > >>>>
> > > > >>>> Tim and myself did some brainstorming and we have a proposal to
> > > > overcome
> > > > >>>> these shortcomings. The main problem in all the above cases is
> > that
> > > > the
> > > > >>>> property changes are happening out-of-band of data flow and
> hence
> > > > >>>> independent of windowing. The proposal is to bring the property
> > > change
> > > > >>>> request into the in-band dataflow so that they are handled
> > > > consistently
> > > > >>>> with windowing and handled distributively.
> > > > >>>>
> > > > >>>> The idea is to inject a special property change tuple containing
> > the
> > > > >>>> property changes and the identification information of the
> > > operator's
> > > > >>> they
> > > > >>>> affect into the dataflow at the input operator. The tuple will
> be
> > > > >>> injected
> > > > >>>> at window boundary after end window and before begin window and
> as
> > > > this
> > > > >>>> tuple flows through the DAG the intended operators properties
> will
> > > be
> > > > >>>> modifed. They will all be modified consistently at the same
> > window.
> > > > The
> > > > >>>> tuple can contain more than one property changes for more than
> one
> > > > >>> logical
> > > > >>>> operators and the change will be applied consistently to the
> > > different
> > > > >>>> logical operators at the same window. In case of failure the
> > replay
> > > of
> > > > >>>> tuples will ensure that the property change gets reapplied at
> the
> > > > correct
> > > > >>>> window.
> > > > >>>>
> > > > >>>> Please give your feedback and input on what you think about this
> > > > >>> proposal.
> > > > >>>>
> > > > >>>> Thanks
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >
> > > >
> > >
> >
>

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

The user would not be picking the windowid as they cannot correlate with it
something, the setting of property from a user perspective will look very
much the way it is set today except for the additional ability to set
property for more than one operators. Stram will not be managing window
state. The property tuple will be injected at the next possible window at
the input operator except for the case where there are multiple input
operators then the protocol that time mentioned in the earlier email will
be used.

On Thu, Oct 8, 2015 at 9:36 AM, Gaurav Gupta <ga...@datatorrent.com> wrote:

> If user is manually setting the property then user can pick the windowId to
> apply the property change but how would dynamic property change using
> OperatorRequest as part of StatsListener work?
>
> Does this not mean that the Stram will have to start managing window state
> of
> operators?
>
>
> Thanks
> -Gaurav
>
> On Wed, Oct 7, 2015 at 2:48 PM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
>
> > I think we could achieve a 100% gaurantee without (unnecessarily) pausing
> > operators. This could be achieved by making a small addition to the above
> > approach.
> >
> > 1.) pick a window N windows ahead of the current max window of the
> > operators. Let's call this window W
> > 2.) Send a property change request to the operators to change the
> property
> > on window W
> > 3.) As part of the property change request the operator will do one of
> two
> > things:
> >      a Reply with a failure if it has passed window W.
> >      b Reply with a success if it has not already passed window W.
> > 4.) Operators which replied with a success will asynchronously wait for a
> > confirmation message to apply the property. If the operator reaches
> window
> > W before it receives the confirmation, the operator will block until a
> > confirmation is received.
> > 5.) Meanwhile the app master collects the responses to the property
> change
> > requests. If all the property change requests responded with a success,
> > then a confirmation message is sent to all the operators to apply the
> > property. If one or more of the operators replied with failure, then a
> > property change cancellation is sent to all the operators, and then the
> > whole process is retried.
> >
> > This way 99.99% of the time a property change would be applied without
> > pausing operators. Operators will only be paused on rare ocassions, and
> > only for the sake of preventing application errors that could be
> triggered
> > by an incorrect application of a property.
> >
> > Thanks,
> > Tim
> >
> >
> >
> > On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <am...@datatorrent.com> wrote:
> >
> > > Pause is hard to pull off. It has a lot of other side
> effect/consequences
> > > on scale and on external systems that now have to back up. As number of
> > > operators grows the algorithm halts more. Data-in-motion will means
> that
> > > backlog will build up during pause, specially within external systems.
> > The
> > > problem occurs even if we have one logical input operator with N
> > > partitions.
> > >
> > > A much quicker way, though not with a technical guarantee will be to
> let
> > > users decide a window id increment in the future. The command may then
> be
> > > "let me set properties on these operators N window in the future off
> the
> > > current max window id amoung them". A user can then use a high enough N
> > to
> > > get 99.99% certainty that the window is aligned.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > > On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <ti...@datatorrent.com>
> > > wrote:
> > >
> > > > The case where there is no common ancestor also has to be handled.
> For
> > > > example you may need to change a property on two different input
> > > operators.
> > > > In this case the property needs to be set on both operators before
> the
> > > same
> > > > window. This also needs to be done the first time a window is
> computed
> > by
> > > > an input operator, otherwise there would be issues with idempotence.
> > This
> > > > could be achieved by doing the following
> > > >
> > > > 1. input operators would have to be paused when setting a property.
> > > > 2. They would report their window id.
> > > > 3. Then the max window Id needs to be picked
> > > > 4. Then the property needs to be scheduled to be set at the
> appropriate
> > > > window.
> > > > 5. Then the input operators are resumed.
> > > >
> > > > Thanks,
> > > > Tim
> > > > On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:
> > > >
> > > > >
> > > > > The issue comes up when property has to be changed in multiple
> > > operators,
> > > > > logical or physical. Since it does not matter if this is triggered
> by
> > > an
> > > > > input adapter or any parent of this operators, stram can pick
> common
> > > > > ancestor. Property change commands (operator id, prop name, prop
> val)
> > > can
> > > > > be inserted by the stramchild of the common ancestor.
> > > > >
> > > > > Thks
> > > > > Amol
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <gaurav@datatorrent.com
> >
> > > > wrote:
> > > > > >
> > > > > > Pramod,
> > > > > >
> > > > > > The new special property change tuple will be send to all the
> > > Operators
> > > > > and all the operators will have to check if the property change is
> > > > > applicable for it. Although such requests may be very few, but is
> > > there a
> > > > > way to optimize it?
> > > > > >
> > > > > > Thanks
> > > > > > - Gaurav
> > > > > >
> > > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> > > pramod@datatorrent.com>
> > > > > wrote:
> > > > > >>
> > > > > >> At the platform level that cannot be guaranteed as your operator
> > > > > controls
> > > > > >> and manages reading of the data. However it is not difficult to
> > > > envision
> > > > > >> writing an operator that would pick up a new dataset when
> property
> > > is
> > > > > >> changed.
> > > > > >>
> > > > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > > > >> ashwinchandrap@gmail.com> wrote:
> > > > > >>
> > > > > >>> Great, looking forward to these changes. Does it also provide a
> > > > > guarantee
> > > > > >>> on which properties are used for which input data sets?
> > > > > >>>
> > > > > >>> Few use case examples:
> > > > > >>> - set property between reads of different batches of files.
> Say,
> > > > > applying
> > > > > >>> batch name property before processing the next batch of files.
> > > > > >>> - load new configuration file for csv parser before processing
> > next
> > > > > set of
> > > > > >>> data.
> > > > > >>> - apply new regex before parsing next stream of tuples.
> > > > > >>> etc.
> > > > > >>>
> > > > > >>> One approach to allow this is to emit subsequent tuples only
> > > starting
> > > > > next
> > > > > >>> window after the window in which property change is made. That
> > way,
> > > > the
> > > > > >>> boundaries between data sets is fixed and property change is
> done
> > > in
> > > > > >>> between. The user will now have a guarantee on which property
> > value
> > > > is
> > > > > used
> > > > > >>> on any given tuple.
> > > > > >>>
> > > > > >>> Thoughts?
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Ashwin.
> > > > > >>>
> > > > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > > > pramod@datatorrent.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Apex support modification of operator properties at runtime
> but
> > > the
> > > > > >>> current
> > > > > >>>> implemenations has the following shortcomings.
> > > > > >>>>
> > > > > >>>> 1. Property is not set across all partitions on the same
> window
> > as
> > > > > >>>> individual partitions can be on different windows when
> property
> > > > > change is
> > > > > >>>> initiated from client resulting in inconsistency of data for
> > those
> > > > > >>> windows.
> > > > > >>>> I am being generous using the word inconsistent.
> > > > > >>>> 2. Sometimes properties need to be set on more than one
> logical
> > > > > operators
> > > > > >>>> at the same time to achieve the change the user is seeking.
> > Today
> > > > they
> > > > > >>> will
> > > > > >>>> be two separate changes happening on two different windows
> again
> > > > > >>> resulting
> > > > > >>>> in inconsistent data for some windows. These would need to
> > happen
> > > > as a
> > > > > >>>> single transaction.
> > > > > >>>> 3. If there is an operator failure before a committed
> checkpoint
> > > > > after an
> > > > > >>>> operator property is dynamically changed the operator will
> > restart
> > > > > with
> > > > > >>> the
> > > > > >>>> old property and the change will not be re-applied.
> > > > > >>>>
> > > > > >>>> Tim and myself did some brainstorming and we have a proposal
> to
> > > > > overcome
> > > > > >>>> these shortcomings. The main problem in all the above cases is
> > > that
> > > > > the
> > > > > >>>> property changes are happening out-of-band of data flow and
> > hence
> > > > > >>>> independent of windowing. The proposal is to bring the
> property
> > > > change
> > > > > >>>> request into the in-band dataflow so that they are handled
> > > > > consistently
> > > > > >>>> with windowing and handled distributively.
> > > > > >>>>
> > > > > >>>> The idea is to inject a special property change tuple
> containing
> > > the
> > > > > >>>> property changes and the identification information of the
> > > > operator's
> > > > > >>> they
> > > > > >>>> affect into the dataflow at the input operator. The tuple will
> > be
> > > > > >>> injected
> > > > > >>>> at window boundary after end window and before begin window
> and
> > as
> > > > > this
> > > > > >>>> tuple flows through the DAG the intended operators properties
> > will
> > > > be
> > > > > >>>> modifed. They will all be modified consistently at the same
> > > window.
> > > > > The
> > > > > >>>> tuple can contain more than one property changes for more than
> > one
> > > > > >>> logical
> > > > > >>>> operators and the change will be applied consistently to the
> > > > different
> > > > > >>>> logical operators at the same window. In case of failure the
> > > replay
> > > > of
> > > > > >>>> tuples will ensure that the property change gets reapplied at
> > the
> > > > > correct
> > > > > >>>> window.
> > > > > >>>>
> > > > > >>>> Please give your feedback and input on what you think about
> this
> > > > > >>> proposal.
> > > > > >>>>
> > > > > >>>> Thanks
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Ashwin.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: dynamic application properties proposal

Posted by Gaurav Gupta <ga...@datatorrent.com>.

If user is manually setting the property then user can pick the windowId to
apply the property change but how would dynamic property change using
OperatorRequest as part of StatsListener work?

Does this not mean that the Stram will have to start managing window state
of
operators?


Thanks
-Gaurav

On Wed, Oct 7, 2015 at 2:48 PM, Timothy Farkas <ti...@datatorrent.com> wrote:

> I think we could achieve a 100% gaurantee without (unnecessarily) pausing
> operators. This could be achieved by making a small addition to the above
> approach.
>
> 1.) pick a window N windows ahead of the current max window of the
> operators. Let's call this window W
> 2.) Send a property change request to the operators to change the property
> on window W
> 3.) As part of the property change request the operator will do one of two
> things:
>      a Reply with a failure if it has passed window W.
>      b Reply with a success if it has not already passed window W.
> 4.) Operators which replied with a success will asynchronously wait for a
> confirmation message to apply the property. If the operator reaches window
> W before it receives the confirmation, the operator will block until a
> confirmation is received.
> 5.) Meanwhile the app master collects the responses to the property change
> requests. If all the property change requests responded with a success,
> then a confirmation message is sent to all the operators to apply the
> property. If one or more of the operators replied with failure, then a
> property change cancellation is sent to all the operators, and then the
> whole process is retried.
>
> This way 99.99% of the time a property change would be applied without
> pausing operators. Operators will only be paused on rare ocassions, and
> only for the sake of preventing application errors that could be triggered
> by an incorrect application of a property.
>
> Thanks,
> Tim
>
>
>
> On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <am...@datatorrent.com> wrote:
>
> > Pause is hard to pull off. It has a lot of other side effect/consequences
> > on scale and on external systems that now have to back up. As number of
> > operators grows the algorithm halts more. Data-in-motion will means that
> > backlog will build up during pause, specially within external systems.
> The
> > problem occurs even if we have one logical input operator with N
> > partitions.
> >
> > A much quicker way, though not with a technical guarantee will be to let
> > users decide a window id increment in the future. The command may then be
> > "let me set properties on these operators N window in the future off the
> > current max window id amoung them". A user can then use a high enough N
> to
> > get 99.99% certainty that the window is aligned.
> >
> > Thks,
> > Amol
> >
> >
> > On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <ti...@datatorrent.com>
> > wrote:
> >
> > > The case where there is no common ancestor also has to be handled. For
> > > example you may need to change a property on two different input
> > operators.
> > > In this case the property needs to be set on both operators before the
> > same
> > > window. This also needs to be done the first time a window is computed
> by
> > > an input operator, otherwise there would be issues with idempotence.
> This
> > > could be achieved by doing the following
> > >
> > > 1. input operators would have to be paused when setting a property.
> > > 2. They would report their window id.
> > > 3. Then the max window Id needs to be picked
> > > 4. Then the property needs to be scheduled to be set at the appropriate
> > > window.
> > > 5. Then the input operators are resumed.
> > >
> > > Thanks,
> > > Tim
> > > On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:
> > >
> > > >
> > > > The issue comes up when property has to be changed in multiple
> > operators,
> > > > logical or physical. Since it does not matter if this is triggered by
> > an
> > > > input adapter or any parent of this operators, stram can pick common
> > > > ancestor. Property change commands (operator id, prop name, prop val)
> > can
> > > > be inserted by the stramchild of the common ancestor.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com>
> > > wrote:
> > > > >
> > > > > Pramod,
> > > > >
> > > > > The new special property change tuple will be send to all the
> > Operators
> > > > and all the operators will have to check if the property change is
> > > > applicable for it. Although such requests may be very few, but is
> > there a
> > > > way to optimize it?
> > > > >
> > > > > Thanks
> > > > > - Gaurav
> > > > >
> > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > > > wrote:
> > > > >>
> > > > >> At the platform level that cannot be guaranteed as your operator
> > > > controls
> > > > >> and manages reading of the data. However it is not difficult to
> > > envision
> > > > >> writing an operator that would pick up a new dataset when property
> > is
> > > > >> changed.
> > > > >>
> > > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > > >> ashwinchandrap@gmail.com> wrote:
> > > > >>
> > > > >>> Great, looking forward to these changes. Does it also provide a
> > > > guarantee
> > > > >>> on which properties are used for which input data sets?
> > > > >>>
> > > > >>> Few use case examples:
> > > > >>> - set property between reads of different batches of files. Say,
> > > > applying
> > > > >>> batch name property before processing the next batch of files.
> > > > >>> - load new configuration file for csv parser before processing
> next
> > > > set of
> > > > >>> data.
> > > > >>> - apply new regex before parsing next stream of tuples.
> > > > >>> etc.
> > > > >>>
> > > > >>> One approach to allow this is to emit subsequent tuples only
> > starting
> > > > next
> > > > >>> window after the window in which property change is made. That
> way,
> > > the
> > > > >>> boundaries between data sets is fixed and property change is done
> > in
> > > > >>> between. The user will now have a guarantee on which property
> value
> > > is
> > > > used
> > > > >>> on any given tuple.
> > > > >>>
> > > > >>> Thoughts?
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >>>
> > > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > > pramod@datatorrent.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Apex support modification of operator properties at runtime but
> > the
> > > > >>> current
> > > > >>>> implemenations has the following shortcomings.
> > > > >>>>
> > > > >>>> 1. Property is not set across all partitions on the same window
> as
> > > > >>>> individual partitions can be on different windows when property
> > > > change is
> > > > >>>> initiated from client resulting in inconsistency of data for
> those
> > > > >>> windows.
> > > > >>>> I am being generous using the word inconsistent.
> > > > >>>> 2. Sometimes properties need to be set on more than one logical
> > > > operators
> > > > >>>> at the same time to achieve the change the user is seeking.
> Today
> > > they
> > > > >>> will
> > > > >>>> be two separate changes happening on two different windows again
> > > > >>> resulting
> > > > >>>> in inconsistent data for some windows. These would need to
> happen
> > > as a
> > > > >>>> single transaction.
> > > > >>>> 3. If there is an operator failure before a committed checkpoint
> > > > after an
> > > > >>>> operator property is dynamically changed the operator will
> restart
> > > > with
> > > > >>> the
> > > > >>>> old property and the change will not be re-applied.
> > > > >>>>
> > > > >>>> Tim and myself did some brainstorming and we have a proposal to
> > > > overcome
> > > > >>>> these shortcomings. The main problem in all the above cases is
> > that
> > > > the
> > > > >>>> property changes are happening out-of-band of data flow and
> hence
> > > > >>>> independent of windowing. The proposal is to bring the property
> > > change
> > > > >>>> request into the in-band dataflow so that they are handled
> > > > consistently
> > > > >>>> with windowing and handled distributively.
> > > > >>>>
> > > > >>>> The idea is to inject a special property change tuple containing
> > the
> > > > >>>> property changes and the identification information of the
> > > operator's
> > > > >>> they
> > > > >>>> affect into the dataflow at the input operator. The tuple will
> be
> > > > >>> injected
> > > > >>>> at window boundary after end window and before begin window and
> as
> > > > this
> > > > >>>> tuple flows through the DAG the intended operators properties
> will
> > > be
> > > > >>>> modifed. They will all be modified consistently at the same
> > window.
> > > > The
> > > > >>>> tuple can contain more than one property changes for more than
> one
> > > > >>> logical
> > > > >>>> operators and the change will be applied consistently to the
> > > different
> > > > >>>> logical operators at the same window. In case of failure the
> > replay
> > > of
> > > > >>>> tuples will ensure that the property change gets reapplied at
> the
> > > > correct
> > > > >>>> window.
> > > > >>>>
> > > > >>>> Please give your feedback and input on what you think about this
> > > > >>> proposal.
> > > > >>>>
> > > > >>>> Thanks
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >
> > > >
> > >
> >
>

Re: dynamic application properties proposal

Posted by Timothy Farkas <ti...@datatorrent.com>.

I think we could achieve a 100% gaurantee without (unnecessarily) pausing
operators. This could be achieved by making a small addition to the above
approach.

1.) pick a window N windows ahead of the current max window of the
operators. Let's call this window W
2.) Send a property change request to the operators to change the property
on window W
3.) As part of the property change request the operator will do one of two
things:
     a Reply with a failure if it has passed window W.
     b Reply with a success if it has not already passed window W.
4.) Operators which replied with a success will asynchronously wait for a
confirmation message to apply the property. If the operator reaches window
W before it receives the confirmation, the operator will block until a
confirmation is received.
5.) Meanwhile the app master collects the responses to the property change
requests. If all the property change requests responded with a success,
then a confirmation message is sent to all the operators to apply the
property. If one or more of the operators replied with failure, then a
property change cancellation is sent to all the operators, and then the
whole process is retried.

This way 99.99% of the time a property change would be applied without
pausing operators. Operators will only be paused on rare ocassions, and
only for the sake of preventing application errors that could be triggered
by an incorrect application of a property.

Thanks,
Tim



On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <am...@datatorrent.com> wrote:

> Pause is hard to pull off. It has a lot of other side effect/consequences
> on scale and on external systems that now have to back up. As number of
> operators grows the algorithm halts more. Data-in-motion will means that
> backlog will build up during pause, specially within external systems. The
> problem occurs even if we have one logical input operator with N
> partitions.
>
> A much quicker way, though not with a technical guarantee will be to let
> users decide a window id increment in the future. The command may then be
> "let me set properties on these operators N window in the future off the
> current max window id amoung them". A user can then use a high enough N to
> get 99.99% certainty that the window is aligned.
>
> Thks,
> Amol
>
>
> On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
>
> > The case where there is no common ancestor also has to be handled. For
> > example you may need to change a property on two different input
> operators.
> > In this case the property needs to be set on both operators before the
> same
> > window. This also needs to be done the first time a window is computed by
> > an input operator, otherwise there would be issues with idempotence. This
> > could be achieved by doing the following
> >
> > 1. input operators would have to be paused when setting a property.
> > 2. They would report their window id.
> > 3. Then the max window Id needs to be picked
> > 4. Then the property needs to be scheduled to be set at the appropriate
> > window.
> > 5. Then the input operators are resumed.
> >
> > Thanks,
> > Tim
> > On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:
> >
> > >
> > > The issue comes up when property has to be changed in multiple
> operators,
> > > logical or physical. Since it does not matter if this is triggered by
> an
> > > input adapter or any parent of this operators, stram can pick common
> > > ancestor. Property change commands (operator id, prop name, prop val)
> can
> > > be inserted by the stramchild of the common ancestor.
> > >
> > > Thks
> > > Amol
> > >
> > > Sent from my iPhone
> > >
> > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com>
> > wrote:
> > > >
> > > > Pramod,
> > > >
> > > > The new special property change tuple will be send to all the
> Operators
> > > and all the operators will have to check if the property change is
> > > applicable for it. Although such requests may be very few, but is
> there a
> > > way to optimize it?
> > > >
> > > > Thanks
> > > > - Gaurav
> > > >
> > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> pramod@datatorrent.com>
> > > wrote:
> > > >>
> > > >> At the platform level that cannot be guaranteed as your operator
> > > controls
> > > >> and manages reading of the data. However it is not difficult to
> > envision
> > > >> writing an operator that would pick up a new dataset when property
> is
> > > >> changed.
> > > >>
> > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > >> ashwinchandrap@gmail.com> wrote:
> > > >>
> > > >>> Great, looking forward to these changes. Does it also provide a
> > > guarantee
> > > >>> on which properties are used for which input data sets?
> > > >>>
> > > >>> Few use case examples:
> > > >>> - set property between reads of different batches of files. Say,
> > > applying
> > > >>> batch name property before processing the next batch of files.
> > > >>> - load new configuration file for csv parser before processing next
> > > set of
> > > >>> data.
> > > >>> - apply new regex before parsing next stream of tuples.
> > > >>> etc.
> > > >>>
> > > >>> One approach to allow this is to emit subsequent tuples only
> starting
> > > next
> > > >>> window after the window in which property change is made. That way,
> > the
> > > >>> boundaries between data sets is fixed and property change is done
> in
> > > >>> between. The user will now have a guarantee on which property value
> > is
> > > used
> > > >>> on any given tuple.
> > > >>>
> > > >>> Thoughts?
> > > >>>
> > > >>> Regards,
> > > >>> Ashwin.
> > > >>>
> > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > pramod@datatorrent.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Apex support modification of operator properties at runtime but
> the
> > > >>> current
> > > >>>> implemenations has the following shortcomings.
> > > >>>>
> > > >>>> 1. Property is not set across all partitions on the same window as
> > > >>>> individual partitions can be on different windows when property
> > > change is
> > > >>>> initiated from client resulting in inconsistency of data for those
> > > >>> windows.
> > > >>>> I am being generous using the word inconsistent.
> > > >>>> 2. Sometimes properties need to be set on more than one logical
> > > operators
> > > >>>> at the same time to achieve the change the user is seeking. Today
> > they
> > > >>> will
> > > >>>> be two separate changes happening on two different windows again
> > > >>> resulting
> > > >>>> in inconsistent data for some windows. These would need to happen
> > as a
> > > >>>> single transaction.
> > > >>>> 3. If there is an operator failure before a committed checkpoint
> > > after an
> > > >>>> operator property is dynamically changed the operator will restart
> > > with
> > > >>> the
> > > >>>> old property and the change will not be re-applied.
> > > >>>>
> > > >>>> Tim and myself did some brainstorming and we have a proposal to
> > > overcome
> > > >>>> these shortcomings. The main problem in all the above cases is
> that
> > > the
> > > >>>> property changes are happening out-of-band of data flow and hence
> > > >>>> independent of windowing. The proposal is to bring the property
> > change
> > > >>>> request into the in-band dataflow so that they are handled
> > > consistently
> > > >>>> with windowing and handled distributively.
> > > >>>>
> > > >>>> The idea is to inject a special property change tuple containing
> the
> > > >>>> property changes and the identification information of the
> > operator's
> > > >>> they
> > > >>>> affect into the dataflow at the input operator. The tuple will be
> > > >>> injected
> > > >>>> at window boundary after end window and before begin window and as
> > > this
> > > >>>> tuple flows through the DAG the intended operators properties will
> > be
> > > >>>> modifed. They will all be modified consistently at the same
> window.
> > > The
> > > >>>> tuple can contain more than one property changes for more than one
> > > >>> logical
> > > >>>> operators and the change will be applied consistently to the
> > different
> > > >>>> logical operators at the same window. In case of failure the
> replay
> > of
> > > >>>> tuples will ensure that the property change gets reapplied at the
> > > correct
> > > >>>> window.
> > > >>>>
> > > >>>> Please give your feedback and input on what you think about this
> > > >>> proposal.
> > > >>>>
> > > >>>> Thanks
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>>
> > > >>> Regards,
> > > >>> Ashwin.
> > > >
> > >
> >
>

Re: dynamic application properties proposal

Posted by Amol Kekre <am...@datatorrent.com>.

Pause is hard to pull off. It has a lot of other side effect/consequences
on scale and on external systems that now have to back up. As number of
operators grows the algorithm halts more. Data-in-motion will means that
backlog will build up during pause, specially within external systems. The
problem occurs even if we have one logical input operator with N partitions.

A much quicker way, though not with a technical guarantee will be to let
users decide a window id increment in the future. The command may then be
"let me set properties on these operators N window in the future off the
current max window id amoung them". A user can then use a high enough N to
get 99.99% certainty that the window is aligned.

Thks,
Amol


On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <ti...@datatorrent.com> wrote:

> The case where there is no common ancestor also has to be handled. For
> example you may need to change a property on two different input operators.
> In this case the property needs to be set on both operators before the same
> window. This also needs to be done the first time a window is computed by
> an input operator, otherwise there would be issues with idempotence. This
> could be achieved by doing the following
>
> 1. input operators would have to be paused when setting a property.
> 2. They would report their window id.
> 3. Then the max window Id needs to be picked
> 4. Then the property needs to be scheduled to be set at the appropriate
> window.
> 5. Then the input operators are resumed.
>
> Thanks,
> Tim
> On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:
>
> >
> > The issue comes up when property has to be changed in multiple operators,
> > logical or physical. Since it does not matter if this is triggered by an
> > input adapter or any parent of this operators, stram can pick common
> > ancestor. Property change commands (operator id, prop name, prop val) can
> > be inserted by the stramchild of the common ancestor.
> >
> > Thks
> > Amol
> >
> > Sent from my iPhone
> >
> > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com>
> wrote:
> > >
> > > Pramod,
> > >
> > > The new special property change tuple will be send to all the Operators
> > and all the operators will have to check if the property change is
> > applicable for it. Although such requests may be very few, but is there a
> > way to optimize it?
> > >
> > > Thanks
> > > - Gaurav
> > >
> > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <pr...@datatorrent.com>
> > wrote:
> > >>
> > >> At the platform level that cannot be guaranteed as your operator
> > controls
> > >> and manages reading of the data. However it is not difficult to
> envision
> > >> writing an operator that would pick up a new dataset when property is
> > >> changed.
> > >>
> > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > >> ashwinchandrap@gmail.com> wrote:
> > >>
> > >>> Great, looking forward to these changes. Does it also provide a
> > guarantee
> > >>> on which properties are used for which input data sets?
> > >>>
> > >>> Few use case examples:
> > >>> - set property between reads of different batches of files. Say,
> > applying
> > >>> batch name property before processing the next batch of files.
> > >>> - load new configuration file for csv parser before processing next
> > set of
> > >>> data.
> > >>> - apply new regex before parsing next stream of tuples.
> > >>> etc.
> > >>>
> > >>> One approach to allow this is to emit subsequent tuples only starting
> > next
> > >>> window after the window in which property change is made. That way,
> the
> > >>> boundaries between data sets is fixed and property change is done in
> > >>> between. The user will now have a guarantee on which property value
> is
> > used
> > >>> on any given tuple.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> Regards,
> > >>> Ashwin.
> > >>>
> > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > >>> wrote:
> > >>>
> > >>>> Apex support modification of operator properties at runtime but the
> > >>> current
> > >>>> implemenations has the following shortcomings.
> > >>>>
> > >>>> 1. Property is not set across all partitions on the same window as
> > >>>> individual partitions can be on different windows when property
> > change is
> > >>>> initiated from client resulting in inconsistency of data for those
> > >>> windows.
> > >>>> I am being generous using the word inconsistent.
> > >>>> 2. Sometimes properties need to be set on more than one logical
> > operators
> > >>>> at the same time to achieve the change the user is seeking. Today
> they
> > >>> will
> > >>>> be two separate changes happening on two different windows again
> > >>> resulting
> > >>>> in inconsistent data for some windows. These would need to happen
> as a
> > >>>> single transaction.
> > >>>> 3. If there is an operator failure before a committed checkpoint
> > after an
> > >>>> operator property is dynamically changed the operator will restart
> > with
> > >>> the
> > >>>> old property and the change will not be re-applied.
> > >>>>
> > >>>> Tim and myself did some brainstorming and we have a proposal to
> > overcome
> > >>>> these shortcomings. The main problem in all the above cases is that
> > the
> > >>>> property changes are happening out-of-band of data flow and hence
> > >>>> independent of windowing. The proposal is to bring the property
> change
> > >>>> request into the in-band dataflow so that they are handled
> > consistently
> > >>>> with windowing and handled distributively.
> > >>>>
> > >>>> The idea is to inject a special property change tuple containing the
> > >>>> property changes and the identification information of the
> operator's
> > >>> they
> > >>>> affect into the dataflow at the input operator. The tuple will be
> > >>> injected
> > >>>> at window boundary after end window and before begin window and as
> > this
> > >>>> tuple flows through the DAG the intended operators properties will
> be
> > >>>> modifed. They will all be modified consistently at the same window.
> > The
> > >>>> tuple can contain more than one property changes for more than one
> > >>> logical
> > >>>> operators and the change will be applied consistently to the
> different
> > >>>> logical operators at the same window. In case of failure the replay
> of
> > >>>> tuples will ensure that the property change gets reapplied at the
> > correct
> > >>>> window.
> > >>>>
> > >>>> Please give your feedback and input on what you think about this
> > >>> proposal.
> > >>>>
> > >>>> Thanks
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Regards,
> > >>> Ashwin.
> > >
> >
>

Re: dynamic application properties proposal

Posted by Timothy Farkas <ti...@datatorrent.com>.

The case where there is no common ancestor also has to be handled. For
example you may need to change a property on two different input operators.
In this case the property needs to be set on both operators before the same
window. This also needs to be done the first time a window is computed by
an input operator, otherwise there would be issues with idempotence. This
could be achieved by doing the following

1. input operators would have to be paused when setting a property.
2. They would report their window id.
3. Then the max window Id needs to be picked
4. Then the property needs to be scheduled to be set at the appropriate
window.
5. Then the input operators are resumed.

Thanks,
Tim
On Oct 1, 2015 5:39 PM, "Amol Kekre" <am...@gmail.com> wrote:

>
> The issue comes up when property has to be changed in multiple operators,
> logical or physical. Since it does not matter if this is triggered by an
> input adapter or any parent of this operators, stram can pick common
> ancestor. Property change commands (operator id, prop name, prop val) can
> be inserted by the stramchild of the common ancestor.
>
> Thks
> Amol
>
> Sent from my iPhone
>
> > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com> wrote:
> >
> > Pramod,
> >
> > The new special property change tuple will be send to all the Operators
> and all the operators will have to check if the property change is
> applicable for it. Although such requests may be very few, but is there a
> way to optimize it?
> >
> > Thanks
> > - Gaurav
> >
> >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
> >>
> >> At the platform level that cannot be guaranteed as your operator
> controls
> >> and manages reading of the data. However it is not difficult to envision
> >> writing an operator that would pick up a new dataset when property is
> >> changed.
> >>
> >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> >> ashwinchandrap@gmail.com> wrote:
> >>
> >>> Great, looking forward to these changes. Does it also provide a
> guarantee
> >>> on which properties are used for which input data sets?
> >>>
> >>> Few use case examples:
> >>> - set property between reads of different batches of files. Say,
> applying
> >>> batch name property before processing the next batch of files.
> >>> - load new configuration file for csv parser before processing next
> set of
> >>> data.
> >>> - apply new regex before parsing next stream of tuples.
> >>> etc.
> >>>
> >>> One approach to allow this is to emit subsequent tuples only starting
> next
> >>> window after the window in which property change is made. That way, the
> >>> boundaries between data sets is fixed and property change is done in
> >>> between. The user will now have a guarantee on which property value is
> used
> >>> on any given tuple.
> >>>
> >>> Thoughts?
> >>>
> >>> Regards,
> >>> Ashwin.
> >>>
> >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> pramod@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Apex support modification of operator properties at runtime but the
> >>> current
> >>>> implemenations has the following shortcomings.
> >>>>
> >>>> 1. Property is not set across all partitions on the same window as
> >>>> individual partitions can be on different windows when property
> change is
> >>>> initiated from client resulting in inconsistency of data for those
> >>> windows.
> >>>> I am being generous using the word inconsistent.
> >>>> 2. Sometimes properties need to be set on more than one logical
> operators
> >>>> at the same time to achieve the change the user is seeking. Today they
> >>> will
> >>>> be two separate changes happening on two different windows again
> >>> resulting
> >>>> in inconsistent data for some windows. These would need to happen as a
> >>>> single transaction.
> >>>> 3. If there is an operator failure before a committed checkpoint
> after an
> >>>> operator property is dynamically changed the operator will restart
> with
> >>> the
> >>>> old property and the change will not be re-applied.
> >>>>
> >>>> Tim and myself did some brainstorming and we have a proposal to
> overcome
> >>>> these shortcomings. The main problem in all the above cases is that
> the
> >>>> property changes are happening out-of-band of data flow and hence
> >>>> independent of windowing. The proposal is to bring the property change
> >>>> request into the in-band dataflow so that they are handled
> consistently
> >>>> with windowing and handled distributively.
> >>>>
> >>>> The idea is to inject a special property change tuple containing the
> >>>> property changes and the identification information of the operator's
> >>> they
> >>>> affect into the dataflow at the input operator. The tuple will be
> >>> injected
> >>>> at window boundary after end window and before begin window and as
> this
> >>>> tuple flows through the DAG the intended operators properties will be
> >>>> modifed. They will all be modified consistently at the same window.
> The
> >>>> tuple can contain more than one property changes for more than one
> >>> logical
> >>>> operators and the change will be applied consistently to the different
> >>>> logical operators at the same window. In case of failure the replay of
> >>>> tuples will ensure that the property change gets reapplied at the
> correct
> >>>> window.
> >>>>
> >>>> Please give your feedback and input on what you think about this
> >>> proposal.
> >>>>
> >>>> Thanks
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Regards,
> >>> Ashwin.
> >
>

Re: dynamic application properties proposal

Posted by Amol Kekre <am...@gmail.com>.

The issue comes up when property has to be changed in multiple operators, logical or physical. Since it does not matter if this is triggered by an input adapter or any parent of this operators, stram can pick common ancestor. Property change commands (operator id, prop name, prop val) can be inserted by the stramchild of the common ancestor.

Thks 
Amol

Sent from my iPhone

> On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <ga...@datatorrent.com> wrote:
> 
> Pramod,
> 
> The new special property change tuple will be send to all the Operators and all the operators will have to check if the property change is applicable for it. Although such requests may be very few, but is there a way to optimize it?
> 
> Thanks
> - Gaurav
> 
>> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <pr...@datatorrent.com> wrote:
>> 
>> At the platform level that cannot be guaranteed as your operator controls
>> and manages reading of the data. However it is not difficult to envision
>> writing an operator that would pick up a new dataset when property is
>> changed.
>> 
>> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>> 
>>> Great, looking forward to these changes. Does it also provide a guarantee
>>> on which properties are used for which input data sets?
>>> 
>>> Few use case examples:
>>> - set property between reads of different batches of files. Say, applying
>>> batch name property before processing the next batch of files.
>>> - load new configuration file for csv parser before processing next set of
>>> data.
>>> - apply new regex before parsing next stream of tuples.
>>> etc.
>>> 
>>> One approach to allow this is to emit subsequent tuples only starting next
>>> window after the window in which property change is made. That way, the
>>> boundaries between data sets is fixed and property change is done in
>>> between. The user will now have a guarantee on which property value is used
>>> on any given tuple.
>>> 
>>> Thoughts?
>>> 
>>> Regards,
>>> Ashwin.
>>> 
>>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pr...@datatorrent.com>
>>> wrote:
>>> 
>>>> Apex support modification of operator properties at runtime but the
>>> current
>>>> implemenations has the following shortcomings.
>>>> 
>>>> 1. Property is not set across all partitions on the same window as
>>>> individual partitions can be on different windows when property change is
>>>> initiated from client resulting in inconsistency of data for those
>>> windows.
>>>> I am being generous using the word inconsistent.
>>>> 2. Sometimes properties need to be set on more than one logical operators
>>>> at the same time to achieve the change the user is seeking. Today they
>>> will
>>>> be two separate changes happening on two different windows again
>>> resulting
>>>> in inconsistent data for some windows. These would need to happen as a
>>>> single transaction.
>>>> 3. If there is an operator failure before a committed checkpoint after an
>>>> operator property is dynamically changed the operator will restart with
>>> the
>>>> old property and the change will not be re-applied.
>>>> 
>>>> Tim and myself did some brainstorming and we have a proposal to overcome
>>>> these shortcomings. The main problem in all the above cases is that the
>>>> property changes are happening out-of-band of data flow and hence
>>>> independent of windowing. The proposal is to bring the property change
>>>> request into the in-band dataflow so that they are handled consistently
>>>> with windowing and handled distributively.
>>>> 
>>>> The idea is to inject a special property change tuple containing the
>>>> property changes and the identification information of the operator's
>>> they
>>>> affect into the dataflow at the input operator. The tuple will be
>>> injected
>>>> at window boundary after end window and before begin window and as this
>>>> tuple flows through the DAG the intended operators properties will be
>>>> modifed. They will all be modified consistently at the same window. The
>>>> tuple can contain more than one property changes for more than one
>>> logical
>>>> operators and the change will be applied consistently to the different
>>>> logical operators at the same window. In case of failure the replay of
>>>> tuples will ensure that the property change gets reapplied at the correct
>>>> window.
>>>> 
>>>> Please give your feedback and input on what you think about this
>>> proposal.
>>>> 
>>>> Thanks
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Regards,
>>> Ashwin.
>

Re: dynamic application properties proposal

Posted by Gaurav Gupta <ga...@datatorrent.com>.

Pramod,

The new special property change tuple will be send to all the Operators and all the operators will have to check if the property change is applicable for it. Although such requests may be very few, but is there a way to optimize it?

Thanks
- Gaurav

> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <pr...@datatorrent.com> wrote:
> 
> At the platform level that cannot be guaranteed as your operator controls
> and manages reading of the data. However it is not difficult to envision
> writing an operator that would pick up a new dataset when property is
> changed.
> 
> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
> 
>> Great, looking forward to these changes. Does it also provide a guarantee
>> on which properties are used for which input data sets?
>> 
>> Few use case examples:
>> - set property between reads of different batches of files. Say, applying
>> batch name property before processing the next batch of files.
>> - load new configuration file for csv parser before processing next set of
>> data.
>> - apply new regex before parsing next stream of tuples.
>> etc.
>> 
>> One approach to allow this is to emit subsequent tuples only starting next
>> window after the window in which property change is made. That way, the
>> boundaries between data sets is fixed and property change is done in
>> between. The user will now have a guarantee on which property value is used
>> on any given tuple.
>> 
>> Thoughts?
>> 
>> Regards,
>> Ashwin.
>> 
>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>> 
>>> Apex support modification of operator properties at runtime but the
>> current
>>> implemenations has the following shortcomings.
>>> 
>>> 1. Property is not set across all partitions on the same window as
>>> individual partitions can be on different windows when property change is
>>> initiated from client resulting in inconsistency of data for those
>> windows.
>>> I am being generous using the word inconsistent.
>>> 2. Sometimes properties need to be set on more than one logical operators
>>> at the same time to achieve the change the user is seeking. Today they
>> will
>>> be two separate changes happening on two different windows again
>> resulting
>>> in inconsistent data for some windows. These would need to happen as a
>>> single transaction.
>>> 3. If there is an operator failure before a committed checkpoint after an
>>> operator property is dynamically changed the operator will restart with
>> the
>>> old property and the change will not be re-applied.
>>> 
>>> Tim and myself did some brainstorming and we have a proposal to overcome
>>> these shortcomings. The main problem in all the above cases is that the
>>> property changes are happening out-of-band of data flow and hence
>>> independent of windowing. The proposal is to bring the property change
>>> request into the in-band dataflow so that they are handled consistently
>>> with windowing and handled distributively.
>>> 
>>> The idea is to inject a special property change tuple containing the
>>> property changes and the identification information of the operator's
>> they
>>> affect into the dataflow at the input operator. The tuple will be
>> injected
>>> at window boundary after end window and before begin window and as this
>>> tuple flows through the DAG the intended operators properties will be
>>> modifed. They will all be modified consistently at the same window. The
>>> tuple can contain more than one property changes for more than one
>> logical
>>> operators and the change will be applied consistently to the different
>>> logical operators at the same window. In case of failure the replay of
>>> tuples will ensure that the property change gets reapplied at the correct
>>> window.
>>> 
>>> Please give your feedback and input on what you think about this
>> proposal.
>>> 
>>> Thanks
>>> 
>> 
>> 
>> 
>> --
>> 
>> Regards,
>> Ashwin.
>>

Re: dynamic application properties proposal

Posted by Pramod Immaneni <pr...@datatorrent.com>.

At the platform level that cannot be guaranteed as your operator controls
and manages reading of the data. However it is not difficult to envision
writing an operator that would pick up a new dataset when property is
changed.

On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Great, looking forward to these changes. Does it also provide a guarantee
> on which properties are used for which input data sets?
>
> Few use case examples:
> - set property between reads of different batches of files. Say, applying
> batch name property before processing the next batch of files.
> - load new configuration file for csv parser before processing next set of
> data.
> - apply new regex before parsing next stream of tuples.
> etc.
>
> One approach to allow this is to emit subsequent tuples only starting next
> window after the window in which property change is made. That way, the
> boundaries between data sets is fixed and property change is done in
> between. The user will now have a guarantee on which property value is used
> on any given tuple.
>
> Thoughts?
>
> Regards,
> Ashwin.
>
> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > Apex support modification of operator properties at runtime but the
> current
> > implemenations has the following shortcomings.
> >
> > 1. Property is not set across all partitions on the same window as
> > individual partitions can be on different windows when property change is
> > initiated from client resulting in inconsistency of data for those
> windows.
> > I am being generous using the word inconsistent.
> > 2. Sometimes properties need to be set on more than one logical operators
> > at the same time to achieve the change the user is seeking. Today they
> will
> > be two separate changes happening on two different windows again
> resulting
> > in inconsistent data for some windows. These would need to happen as a
> > single transaction.
> > 3. If there is an operator failure before a committed checkpoint after an
> > operator property is dynamically changed the operator will restart with
> the
> > old property and the change will not be re-applied.
> >
> > Tim and myself did some brainstorming and we have a proposal to overcome
> > these shortcomings. The main problem in all the above cases is that the
> > property changes are happening out-of-band of data flow and hence
> > independent of windowing. The proposal is to bring the property change
> > request into the in-band dataflow so that they are handled consistently
> > with windowing and handled distributively.
> >
> > The idea is to inject a special property change tuple containing the
> > property changes and the identification information of the operator's
> they
> > affect into the dataflow at the input operator. The tuple will be
> injected
> > at window boundary after end window and before begin window and as this
> > tuple flows through the DAG the intended operators properties will be
> > modifed. They will all be modified consistently at the same window. The
> > tuple can contain more than one property changes for more than one
> logical
> > operators and the change will be applied consistently to the different
> > logical operators at the same window. In case of failure the replay of
> > tuples will ensure that the property change gets reapplied at the correct
> > window.
> >
> > Please give your feedback and input on what you think about this
> proposal.
> >
> > Thanks
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: dynamic application properties proposal

Posted by Ashwin Chandra Putta <as...@gmail.com>.

Great, looking forward to these changes. Does it also provide a guarantee
on which properties are used for which input data sets?

Few use case examples:
- set property between reads of different batches of files. Say, applying
batch name property before processing the next batch of files.
- load new configuration file for csv parser before processing next set of
data.
- apply new regex before parsing next stream of tuples.
etc.

One approach to allow this is to emit subsequent tuples only starting next
window after the window in which property change is made. That way, the
boundaries between data sets is fixed and property change is done in
between. The user will now have a guarantee on which property value is used
on any given tuple.

Thoughts?

Regards,
Ashwin.

On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Apex support modification of operator properties at runtime but the current
> implemenations has the following shortcomings.
>
> 1. Property is not set across all partitions on the same window as
> individual partitions can be on different windows when property change is
> initiated from client resulting in inconsistency of data for those windows.
> I am being generous using the word inconsistent.
> 2. Sometimes properties need to be set on more than one logical operators
> at the same time to achieve the change the user is seeking. Today they will
> be two separate changes happening on two different windows again resulting
> in inconsistent data for some windows. These would need to happen as a
> single transaction.
> 3. If there is an operator failure before a committed checkpoint after an
> operator property is dynamically changed the operator will restart with the
> old property and the change will not be re-applied.
>
> Tim and myself did some brainstorming and we have a proposal to overcome
> these shortcomings. The main problem in all the above cases is that the
> property changes are happening out-of-band of data flow and hence
> independent of windowing. The proposal is to bring the property change
> request into the in-band dataflow so that they are handled consistently
> with windowing and handled distributively.
>
> The idea is to inject a special property change tuple containing the
> property changes and the identification information of the operator's they
> affect into the dataflow at the input operator. The tuple will be injected
> at window boundary after end window and before begin window and as this
> tuple flows through the DAG the intended operators properties will be
> modifed. They will all be modified consistently at the same window. The
> tuple can contain more than one property changes for more than one logical
> operators and the change will be applied consistently to the different
> logical operators at the same window. In case of failure the replay of
> tuples will ensure that the property change gets reapplied at the correct
> window.
>
> Please give your feedback and input on what you think about this proposal.
>
> Thanks
>



-- 

Regards,
Ashwin.