You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Bhupesh Chawda <bh...@datatorrent.com> on 2016/02/04 06:29:47 UTC

Regarding Iterations and Delay Operator

Hi,

I am working on a dag which has a loop. As per my understanding, tuples
which are flowing on the loop back stream, will arrive at the upstream
operator in at least the next window.

Here is an example:

Source -> A -> B -> Delay -> A

In the example above, tuples in window id X which arrive at B, will be sent
to A again in window id (X + n), where n >= 1.
I understand this requirement is for the tuples to be recovered in case of
a failure of operator B. However, is there a way I can allow the tuples to
loop back in the same window, by relaxing the fault tolerance feature. In
other words, I need tuples to immediately loop back and not wait for the
next window to arrive at operator A. I am okay if these tuples are not
recovered in case of a failure.

Thanks.
-Bhupesh

Re: Regarding Iterations and Delay Operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi Vlad,

I encountered this problem in the context of Apex integration with SAMOA.
This means, we are running an operator which is not written by us, "as is",
on Apex. Hence, we don't have much control over the operator logic.

Additionally SAMOA algorithms are implemented in a framework which does not
acknowledge any windowing concept. For this reason, it is required that we
make the implementation on Apex agnostic of windows. The way iteration
works is closely bound with the windowing concept. Hence I tried to make
the window shorter so that we decrease the wait time for the next window
when looping back. An ideal scenario to remove the dependence of iterations
on windows would be to send one tuple per window. But that might be an
overkill, and hence decreasing the window size seems to be a reasonable
option in this case.

To address your question, the number of tuples that need to be emitted per
window might differ for each algorithm and hence it is difficult to have a
generic operator wrapper, which can encapsulate all of such operators while
at the same time have specific implementations (like max. number of tuples
to emit per window) for each such operator.

-Thanks.
Bhupesh

On Thu, Feb 11, 2016 at 5:50 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> I don't understand why A must process 1000 tuples before window is closed.
> The window is defined as a time interval, not as number of tuples and an
> operator is free to emit as many tuples as it needs. In the described case,
> A can emit 100 tuples to B and hold remaining 900 tuples in it's internal
> queue till it gets the necessary data from B. In the next window, it can
> process the next 100 tuples (or any other number based on the algorithm)
> and hold 800 + (all tuples it receives on the input port).
>
> It will be more efficient to let A specify that it emitted all tuples for
> the current window and let platform emit END_WINDOW.
>
>
> On 2/9/16 02:30, David Yan wrote:
>
>> I think what you did is fine and I wouldn't consider a hack. The reason
>> for
>> the +1 increment is exactly what Vlad said, the window needs to end and
>> the
>> window cannot end until all input ports have sent the END_WINDOW tuple for
>> that particular window. And that's why all streaming platforms use the DAG
>> (A as in acyclic) as the programming paradigm.
>>
>> Assuming in your use case B is the delay operator and your streaming
>> window
>> is small, if it helps with your use case, you can also try having the
>> application window count for A be a multiple number while B remains to be
>> 1, so that from the perspective of A, the feedback tuple from B to A
>> arrives in the same application window (except the last streaming window
>> of
>> the application window).
>>
>> David
>> On Feb 9, 2016 6:11 PM, "Bhupesh Chawda" <bh...@datatorrent.com> wrote:
>>
>> Hi David,
>>>
>>> Taking an example scenario, let's say we get 1000 tuples in the current
>>> window, which is processed by operator A. Operator A identifies features
>>> out of these tuples and passes them on to B. Now, if A decides that after
>>> say, 100 tuples it wants to compute the statistics on the features of 100
>>> tuples, it will ask operator B to compute this and send it back. This
>>> feedback is needed so that A can apply this statistic on the following
>>> tuples, i.e., tuples 101 onwards. Now B may compute this and send it
>>> back.
>>> But these are stuck at the delay operator and do not arrive at A until
>>> all
>>> the 1000 tuples have been processed. Meanwhile, operator A times out on
>>> the
>>> response, and proceeds with some default stats. Due to this behavior, the
>>> feedback which was expected after 100 tuples arrives after 1000 tuples.
>>> This means the statistic that was to be applied on tuples 101 onwards, is
>>> applied on tuples 1001 onwards. This makes the algorithm learn / converge
>>> at a very slow rate, which otherwise would have been done quickly.
>>>
>>> Just an update: I tried reducing the size of the streaming window to 100
>>> ms. This hack is currently working for the current use case, but I am not
>>> sure if this will work for all scenarios.
>>>
>>> Thanks.
>>> -Bhupesh
>>>
>>> On Tue, Feb 9, 2016 at 3:24 PM, David Yan <da...@datatorrent.com> wrote:
>>>
>>> Hi Bhupesh,
>>>>
>>>> In the use case you described, can you explain the reason why the
>>>>
>>> feedback
>>>
>>>> tuples from B to A need to be in the same window?
>>>>
>>>> David
>>>> On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bh...@datatorrent.com>
>>>>
>>> wrote:
>>>
>>>> Hi Vlad / Tim,
>>>>>
>>>>> This use case is part of the integration with Apache SAMOA. I am trying
>>>>>
>>>> to
>>>>
>>>>> run an algorithm which is written in SAMOA, on the Apex platform. This
>>>>> algorithm already requires two operators in the formation I described.
>>>>>
>>>> This
>>>>
>>>>> is the reason I cannot control how the operators are arranged in the
>>>>>
>>>> DAG.
>>>
>>>> To answer Vlad's question, here are some more details on the use case.
>>>>> Operator A processes tuples coming in from the source, processes them
>>>>>
>>>> and
>>>
>>>> passes them on to B. After few tuples, it may request B to compute some
>>>>> stats on the tuples and get it back (through the loop back stream), so
>>>>>
>>>> that
>>>>
>>>>> these stats could be used while processing the next few tuples coming
>>>>>
>>>> from
>>>>
>>>>> the source.
>>>>> In case of Apex, I am suspecting that these stats that B needs to send
>>>>>
>>>> to A
>>>>
>>>>> through the loop back stream, are arriving too late (because they are
>>>>> waiting for the end of the current window), and hence A times out on
>>>>>
>>>> the
>>>
>>>> response from B and proceeds using some default values. This results in
>>>>> incorrect computation at the end of the DAG.
>>>>>
>>>>> Also, please note that relaxing the fault tolerance feature is not a
>>>>> requirement. I mentioned this because I assumed (perhaps incorrectly)
>>>>>
>>>> that
>>>>
>>>>> this was the reason for incrementing windows in the delay operator.
>>>>> However, as Vlad pointed out, there should be some point where an
>>>>>
>>>> operator
>>>>
>>>>> could close the window.
>>>>>
>>>>> Any suggestions on how I can achieve the above mentioned use case?
>>>>>
>>>>> Thanks.
>>>>> -Bhupesh
>>>>>
>>>>> On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> Yes Partitioning A or B would be a good use case for iteration. Does
>>>>>> iteration currently support partitioning though? I thought it didn't,
>>>>>>
>>>>> but I
>>>>>
>>>>>> may be behind the times since I've been exiled from the office :).
>>>>>>
>>>>>> On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> What if B should be partitioned?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/4/16 17:43, Timothy Farkas wrote:
>>>>>>>
>>>>>>> My question is why use iteration at all in such a case? You could
>>>>>>>>
>>>>>>> just
>>>>
>>>>> encapsulate A and B in a single single operator (call it OP) as
>>>>>>>> components,
>>>>>>>> and take the tuples output from B and put them to A. OP would also
>>>>>>>>
>>>>>>> contain
>>>>>>
>>>>>>> the logic to decide when to stop looping each tuple emitted by B
>>>>>>>>
>>>>>>> back
>>>>
>>>>> to
>>>>>
>>>>>> A.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tim
>>>>>>>>
>>>>>>>> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <
>>>>>>>>
>>>>>>> v.rozov@datatorrent.com>
>>>>
>>>>> wrote:
>>>>>>>>
>>>>>>>> IMO, it will be good to provide a little bit more details
>>>>>>>>
>>>>>>> regarding
>>>
>>>> the
>>>>>
>>>>>> use case, namely what drives the requirement and why is it OK to
>>>>>>>>>
>>>>>>>> relax
>>>>>
>>>>>> the
>>>>>>>>> fault tolerance feature. Another question is when will it be OK
>>>>>>>>>
>>>>>>>> to
>>>
>>>> close
>>>>>>
>>>>>>> the current window for the operator A? A can't close it as there
>>>>>>>>>
>>>>>>>> may
>>>>
>>>>> be
>>>>>
>>>>>> more tuples coming from the input stream connected to the Delay
>>>>>>>>>
>>>>>>>> operator
>>>>>>
>>>>>>> and Delay operator can't close it because A will not send
>>>>>>>>>
>>>>>>>> END_WINDOW
>>>>
>>>>> waiting for END_WINDOW on the input port connected to the Delay
>>>>>>>>>
>>>>>>>> operator.
>>>>>>
>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/4/16 01:04, Bhupesh Chawda wrote:
>>>>>>>>>
>>>>>>>>> Exactly. That is the requirement. Then the feedback can be
>>>>>>>>>
>>>>>>>> utilized
>>>
>>>> for
>>>>>
>>>>>> tuples in the same window rather than the tuples in the next
>>>>>>>>>>
>>>>>>>>> window.
>>>>
>>>>> -Bhupesh
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
>>>>>>>>>> sandeep@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Bhupesh: Do you mean to say that you would like to use Delay
>>>>>>>>>>
>>>>>>>>> Operator
>>>>>
>>>>>> with
>>>>>>>>>>
>>>>>>>>>> NO delay? Essentially you need feed back in real-time and not
>>>>>>>>>>>
>>>>>>>>>> delayed
>>>>>
>>>>>> by
>>>>>>>>>>> a
>>>>>>>>>>> window.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sandeep
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
>>>>>>>>>>> bhupesh@datatorrent.com
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am working on a dag which has a loop. As per my
>>>>>>>>>>>>
>>>>>>>>>>> understanding,
>>>
>>>> tuples
>>>>>>>>>>>> which are flowing on the loop back stream, will arrive at the
>>>>>>>>>>>>
>>>>>>>>>>> upstream
>>>>>>
>>>>>>> operator in at least the next window.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is an example:
>>>>>>>>>>>>
>>>>>>>>>>>> Source -> A -> B -> Delay -> A
>>>>>>>>>>>>
>>>>>>>>>>>> In the example above, tuples in window id X which arrive at B,
>>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>
>>>>>> be
>>>>>>
>>>>>>> sent
>>>>>>>>>>>>
>>>>>>>>>>> to A again in window id (X + n), where n >= 1.
>>>>>>>>>>>
>>>>>>>>>>>> I understand this requirement is for the tuples to be
>>>>>>>>>>>>
>>>>>>>>>>> recovered
>>>
>>>> in
>>>>
>>>>> case
>>>>>>>>>>>>
>>>>>>>>>>>> of
>>>>>>>>>>>>
>>>>>>>>>>> a failure of operator B. However, is there a way I can allow
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> tuples
>>>>>>
>>>>>>> to
>>>>>>>>>>>>
>>>>>>>>>>> loop back in the same window, by relaxing the fault tolerance
>>>>>>>>>>>
>>>>>>>>>> feature.
>>>>>>
>>>>>>> In
>>>>>>>>>>>> other words, I need tuples to immediately loop back and not
>>>>>>>>>>>>
>>>>>>>>>>> wait
>>>
>>>> for
>>>>>
>>>>>> the
>>>>>>>>>>>> next window to arrive at operator A. I am okay if these tuples
>>>>>>>>>>>>
>>>>>>>>>>> are
>>>>
>>>>> not
>>>>>>
>>>>>>> recovered in case of a failure.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> -Bhupesh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Re: Regarding Iterations and Delay Operator

Posted by Vlad Rozov <v....@datatorrent.com>.
I don't understand why A must process 1000 tuples before window is 
closed. The window is defined as a time interval, not as number of 
tuples and an operator is free to emit as many tuples as it needs. In 
the described case, A can emit 100 tuples to B and hold remaining 900 
tuples in it's internal queue till it gets the necessary data from B. In 
the next window, it can process the next 100 tuples (or any other number 
based on the algorithm) and hold 800 + (all tuples it receives on the 
input port).

It will be more efficient to let A specify that it emitted all tuples 
for the current window and let platform emit END_WINDOW.

On 2/9/16 02:30, David Yan wrote:
> I think what you did is fine and I wouldn't consider a hack. The reason for
> the +1 increment is exactly what Vlad said, the window needs to end and the
> window cannot end until all input ports have sent the END_WINDOW tuple for
> that particular window. And that's why all streaming platforms use the DAG
> (A as in acyclic) as the programming paradigm.
>
> Assuming in your use case B is the delay operator and your streaming window
> is small, if it helps with your use case, you can also try having the
> application window count for A be a multiple number while B remains to be
> 1, so that from the perspective of A, the feedback tuple from B to A
> arrives in the same application window (except the last streaming window of
> the application window).
>
> David
> On Feb 9, 2016 6:11 PM, "Bhupesh Chawda" <bh...@datatorrent.com> wrote:
>
>> Hi David,
>>
>> Taking an example scenario, let's say we get 1000 tuples in the current
>> window, which is processed by operator A. Operator A identifies features
>> out of these tuples and passes them on to B. Now, if A decides that after
>> say, 100 tuples it wants to compute the statistics on the features of 100
>> tuples, it will ask operator B to compute this and send it back. This
>> feedback is needed so that A can apply this statistic on the following
>> tuples, i.e., tuples 101 onwards. Now B may compute this and send it back.
>> But these are stuck at the delay operator and do not arrive at A until all
>> the 1000 tuples have been processed. Meanwhile, operator A times out on the
>> response, and proceeds with some default stats. Due to this behavior, the
>> feedback which was expected after 100 tuples arrives after 1000 tuples.
>> This means the statistic that was to be applied on tuples 101 onwards, is
>> applied on tuples 1001 onwards. This makes the algorithm learn / converge
>> at a very slow rate, which otherwise would have been done quickly.
>>
>> Just an update: I tried reducing the size of the streaming window to 100
>> ms. This hack is currently working for the current use case, but I am not
>> sure if this will work for all scenarios.
>>
>> Thanks.
>> -Bhupesh
>>
>> On Tue, Feb 9, 2016 at 3:24 PM, David Yan <da...@datatorrent.com> wrote:
>>
>>> Hi Bhupesh,
>>>
>>> In the use case you described, can you explain the reason why the
>> feedback
>>> tuples from B to A need to be in the same window?
>>>
>>> David
>>> On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bh...@datatorrent.com>
>> wrote:
>>>> Hi Vlad / Tim,
>>>>
>>>> This use case is part of the integration with Apache SAMOA. I am trying
>>> to
>>>> run an algorithm which is written in SAMOA, on the Apex platform. This
>>>> algorithm already requires two operators in the formation I described.
>>> This
>>>> is the reason I cannot control how the operators are arranged in the
>> DAG.
>>>> To answer Vlad's question, here are some more details on the use case.
>>>> Operator A processes tuples coming in from the source, processes them
>> and
>>>> passes them on to B. After few tuples, it may request B to compute some
>>>> stats on the tuples and get it back (through the loop back stream), so
>>> that
>>>> these stats could be used while processing the next few tuples coming
>>> from
>>>> the source.
>>>> In case of Apex, I am suspecting that these stats that B needs to send
>>> to A
>>>> through the loop back stream, are arriving too late (because they are
>>>> waiting for the end of the current window), and hence A times out on
>> the
>>>> response from B and proceeds using some default values. This results in
>>>> incorrect computation at the end of the DAG.
>>>>
>>>> Also, please note that relaxing the fault tolerance feature is not a
>>>> requirement. I mentioned this because I assumed (perhaps incorrectly)
>>> that
>>>> this was the reason for incrementing windows in the delay operator.
>>>> However, as Vlad pointed out, there should be some point where an
>>> operator
>>>> could close the window.
>>>>
>>>> Any suggestions on how I can achieve the above mentioned use case?
>>>>
>>>> Thanks.
>>>> -Bhupesh
>>>>
>>>> On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com>
>>>> wrote:
>>>>
>>>>> Yes Partitioning A or B would be a good use case for iteration. Does
>>>>> iteration currently support partitioning though? I thought it didn't,
>>>> but I
>>>>> may be behind the times since I've been exiled from the office :).
>>>>>
>>>>> On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>> What if B should be partitioned?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/4/16 17:43, Timothy Farkas wrote:
>>>>>>
>>>>>>> My question is why use iteration at all in such a case? You could
>>> just
>>>>>>> encapsulate A and B in a single single operator (call it OP) as
>>>>>>> components,
>>>>>>> and take the tuples output from B and put them to A. OP would also
>>>>> contain
>>>>>>> the logic to decide when to stop looping each tuple emitted by B
>>> back
>>>> to
>>>>>>> A.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <
>>> v.rozov@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> IMO, it will be good to provide a little bit more details
>> regarding
>>>> the
>>>>>>>> use case, namely what drives the requirement and why is it OK to
>>>> relax
>>>>>>>> the
>>>>>>>> fault tolerance feature. Another question is when will it be OK
>> to
>>>>> close
>>>>>>>> the current window for the operator A? A can't close it as there
>>> may
>>>> be
>>>>>>>> more tuples coming from the input stream connected to the Delay
>>>>> operator
>>>>>>>> and Delay operator can't close it because A will not send
>>> END_WINDOW
>>>>>>>> waiting for END_WINDOW on the input port connected to the Delay
>>>>> operator.
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/4/16 01:04, Bhupesh Chawda wrote:
>>>>>>>>
>>>>>>>> Exactly. That is the requirement. Then the feedback can be
>> utilized
>>>> for
>>>>>>>>> tuples in the same window rather than the tuples in the next
>>> window.
>>>>>>>>> -Bhupesh
>>>>>>>>>
>>>>>>>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
>>>>>>>>> sandeep@datatorrent.com
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Bhupesh: Do you mean to say that you would like to use Delay
>>>> Operator
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>>> NO delay? Essentially you need feed back in real-time and not
>>>> delayed
>>>>>>>>>> by
>>>>>>>>>> a
>>>>>>>>>> window.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Sandeep
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
>>>>>>>>>> bhupesh@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>> I am working on a dag which has a loop. As per my
>> understanding,
>>>>>>>>>>> tuples
>>>>>>>>>>> which are flowing on the loop back stream, will arrive at the
>>>>> upstream
>>>>>>>>>>> operator in at least the next window.
>>>>>>>>>>>
>>>>>>>>>>> Here is an example:
>>>>>>>>>>>
>>>>>>>>>>> Source -> A -> B -> Delay -> A
>>>>>>>>>>>
>>>>>>>>>>> In the example above, tuples in window id X which arrive at B,
>>>> will
>>>>> be
>>>>>>>>>>> sent
>>>>>>>>>> to A again in window id (X + n), where n >= 1.
>>>>>>>>>>> I understand this requirement is for the tuples to be
>> recovered
>>> in
>>>>>>>>>>> case
>>>>>>>>>>>
>>>>>>>>>>> of
>>>>>>>>>> a failure of operator B. However, is there a way I can allow
>> the
>>>>> tuples
>>>>>>>>>>> to
>>>>>>>>>> loop back in the same window, by relaxing the fault tolerance
>>>>> feature.
>>>>>>>>>>> In
>>>>>>>>>>> other words, I need tuples to immediately loop back and not
>> wait
>>>> for
>>>>>>>>>>> the
>>>>>>>>>>> next window to arrive at operator A. I am okay if these tuples
>>> are
>>>>> not
>>>>>>>>>>> recovered in case of a failure.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> -Bhupesh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>


Re: Regarding Iterations and Delay Operator

Posted by David Yan <da...@datatorrent.com>.
I think what you did is fine and I wouldn't consider a hack. The reason for
the +1 increment is exactly what Vlad said, the window needs to end and the
window cannot end until all input ports have sent the END_WINDOW tuple for
that particular window. And that's why all streaming platforms use the DAG
(A as in acyclic) as the programming paradigm.

Assuming in your use case B is the delay operator and your streaming window
is small, if it helps with your use case, you can also try having the
application window count for A be a multiple number while B remains to be
1, so that from the perspective of A, the feedback tuple from B to A
arrives in the same application window (except the last streaming window of
the application window).

David
On Feb 9, 2016 6:11 PM, "Bhupesh Chawda" <bh...@datatorrent.com> wrote:

> Hi David,
>
> Taking an example scenario, let's say we get 1000 tuples in the current
> window, which is processed by operator A. Operator A identifies features
> out of these tuples and passes them on to B. Now, if A decides that after
> say, 100 tuples it wants to compute the statistics on the features of 100
> tuples, it will ask operator B to compute this and send it back. This
> feedback is needed so that A can apply this statistic on the following
> tuples, i.e., tuples 101 onwards. Now B may compute this and send it back.
> But these are stuck at the delay operator and do not arrive at A until all
> the 1000 tuples have been processed. Meanwhile, operator A times out on the
> response, and proceeds with some default stats. Due to this behavior, the
> feedback which was expected after 100 tuples arrives after 1000 tuples.
> This means the statistic that was to be applied on tuples 101 onwards, is
> applied on tuples 1001 onwards. This makes the algorithm learn / converge
> at a very slow rate, which otherwise would have been done quickly.
>
> Just an update: I tried reducing the size of the streaming window to 100
> ms. This hack is currently working for the current use case, but I am not
> sure if this will work for all scenarios.
>
> Thanks.
> -Bhupesh
>
> On Tue, Feb 9, 2016 at 3:24 PM, David Yan <da...@datatorrent.com> wrote:
>
> > Hi Bhupesh,
> >
> > In the use case you described, can you explain the reason why the
> feedback
> > tuples from B to A need to be in the same window?
> >
> > David
> > On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bh...@datatorrent.com>
> wrote:
> >
> > > Hi Vlad / Tim,
> > >
> > > This use case is part of the integration with Apache SAMOA. I am trying
> > to
> > > run an algorithm which is written in SAMOA, on the Apex platform. This
> > > algorithm already requires two operators in the formation I described.
> > This
> > > is the reason I cannot control how the operators are arranged in the
> DAG.
> > >
> > > To answer Vlad's question, here are some more details on the use case.
> > > Operator A processes tuples coming in from the source, processes them
> and
> > > passes them on to B. After few tuples, it may request B to compute some
> > > stats on the tuples and get it back (through the loop back stream), so
> > that
> > > these stats could be used while processing the next few tuples coming
> > from
> > > the source.
> > > In case of Apex, I am suspecting that these stats that B needs to send
> > to A
> > > through the loop back stream, are arriving too late (because they are
> > > waiting for the end of the current window), and hence A times out on
> the
> > > response from B and proceeds using some default values. This results in
> > > incorrect computation at the end of the DAG.
> > >
> > > Also, please note that relaxing the fault tolerance feature is not a
> > > requirement. I mentioned this because I assumed (perhaps incorrectly)
> > that
> > > this was the reason for incrementing windows in the delay operator.
> > > However, as Vlad pointed out, there should be some point where an
> > operator
> > > could close the window.
> > >
> > > Any suggestions on how I can achieve the above mentioned use case?
> > >
> > > Thanks.
> > > -Bhupesh
> > >
> > > On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com>
> > > wrote:
> > >
> > > > Yes Partitioning A or B would be a good use case for iteration. Does
> > > > iteration currently support partitioning though? I thought it didn't,
> > > but I
> > > > may be behind the times since I've been exiled from the office :).
> > > >
> > > > On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
> > > > wrote:
> > > >
> > > > > What if B should be partitioned?
> > > > >
> > > > >
> > > > >
> > > > > On 2/4/16 17:43, Timothy Farkas wrote:
> > > > >
> > > > >> My question is why use iteration at all in such a case? You could
> > just
> > > > >> encapsulate A and B in a single single operator (call it OP) as
> > > > >> components,
> > > > >> and take the tuples output from B and put them to A. OP would also
> > > > contain
> > > > >> the logic to decide when to stop looping each tuple emitted by B
> > back
> > > to
> > > > >> A.
> > > > >>
> > > > >> Thanks,
> > > > >> Tim
> > > > >>
> > > > >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <
> > v.rozov@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >> IMO, it will be good to provide a little bit more details
> regarding
> > > the
> > > > >>> use case, namely what drives the requirement and why is it OK to
> > > relax
> > > > >>> the
> > > > >>> fault tolerance feature. Another question is when will it be OK
> to
> > > > close
> > > > >>> the current window for the operator A? A can't close it as there
> > may
> > > be
> > > > >>> more tuples coming from the input stream connected to the Delay
> > > > operator
> > > > >>> and Delay operator can't close it because A will not send
> > END_WINDOW
> > > > >>> waiting for END_WINDOW on the input port connected to the Delay
> > > > operator.
> > > > >>>
> > > > >>> Vlad
> > > > >>>
> > > > >>>
> > > > >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> > > > >>>
> > > > >>> Exactly. That is the requirement. Then the feedback can be
> utilized
> > > for
> > > > >>>> tuples in the same window rather than the tuples in the next
> > window.
> > > > >>>>
> > > > >>>> -Bhupesh
> > > > >>>>
> > > > >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> > > > >>>> sandeep@datatorrent.com
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> Bhupesh: Do you mean to say that you would like to use Delay
> > > Operator
> > > > >>>> with
> > > > >>>>
> > > > >>>>> NO delay? Essentially you need feed back in real-time and not
> > > delayed
> > > > >>>>> by
> > > > >>>>> a
> > > > >>>>> window.
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> Sandeep
> > > > >>>>>
> > > > >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> > > > >>>>> bhupesh@datatorrent.com
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>>> I am working on a dag which has a loop. As per my
> understanding,
> > > > >>>>>> tuples
> > > > >>>>>> which are flowing on the loop back stream, will arrive at the
> > > > upstream
> > > > >>>>>> operator in at least the next window.
> > > > >>>>>>
> > > > >>>>>> Here is an example:
> > > > >>>>>>
> > > > >>>>>> Source -> A -> B -> Delay -> A
> > > > >>>>>>
> > > > >>>>>> In the example above, tuples in window id X which arrive at B,
> > > will
> > > > be
> > > > >>>>>>
> > > > >>>>>> sent
> > > > >>>>>
> > > > >>>>> to A again in window id (X + n), where n >= 1.
> > > > >>>>>> I understand this requirement is for the tuples to be
> recovered
> > in
> > > > >>>>>> case
> > > > >>>>>>
> > > > >>>>>> of
> > > > >>>>>
> > > > >>>>> a failure of operator B. However, is there a way I can allow
> the
> > > > tuples
> > > > >>>>>>
> > > > >>>>>> to
> > > > >>>>>
> > > > >>>>> loop back in the same window, by relaxing the fault tolerance
> > > > feature.
> > > > >>>>>> In
> > > > >>>>>> other words, I need tuples to immediately loop back and not
> wait
> > > for
> > > > >>>>>> the
> > > > >>>>>> next window to arrive at operator A. I am okay if these tuples
> > are
> > > > not
> > > > >>>>>> recovered in case of a failure.
> > > > >>>>>>
> > > > >>>>>> Thanks.
> > > > >>>>>> -Bhupesh
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >
> > > >
> > >
> >
>

Re: Regarding Iterations and Delay Operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi David,

Taking an example scenario, let's say we get 1000 tuples in the current
window, which is processed by operator A. Operator A identifies features
out of these tuples and passes them on to B. Now, if A decides that after
say, 100 tuples it wants to compute the statistics on the features of 100
tuples, it will ask operator B to compute this and send it back. This
feedback is needed so that A can apply this statistic on the following
tuples, i.e., tuples 101 onwards. Now B may compute this and send it back.
But these are stuck at the delay operator and do not arrive at A until all
the 1000 tuples have been processed. Meanwhile, operator A times out on the
response, and proceeds with some default stats. Due to this behavior, the
feedback which was expected after 100 tuples arrives after 1000 tuples.
This means the statistic that was to be applied on tuples 101 onwards, is
applied on tuples 1001 onwards. This makes the algorithm learn / converge
at a very slow rate, which otherwise would have been done quickly.

Just an update: I tried reducing the size of the streaming window to 100
ms. This hack is currently working for the current use case, but I am not
sure if this will work for all scenarios.

Thanks.
-Bhupesh

On Tue, Feb 9, 2016 at 3:24 PM, David Yan <da...@datatorrent.com> wrote:

> Hi Bhupesh,
>
> In the use case you described, can you explain the reason why the feedback
> tuples from B to A need to be in the same window?
>
> David
> On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bh...@datatorrent.com> wrote:
>
> > Hi Vlad / Tim,
> >
> > This use case is part of the integration with Apache SAMOA. I am trying
> to
> > run an algorithm which is written in SAMOA, on the Apex platform. This
> > algorithm already requires two operators in the formation I described.
> This
> > is the reason I cannot control how the operators are arranged in the DAG.
> >
> > To answer Vlad's question, here are some more details on the use case.
> > Operator A processes tuples coming in from the source, processes them and
> > passes them on to B. After few tuples, it may request B to compute some
> > stats on the tuples and get it back (through the loop back stream), so
> that
> > these stats could be used while processing the next few tuples coming
> from
> > the source.
> > In case of Apex, I am suspecting that these stats that B needs to send
> to A
> > through the loop back stream, are arriving too late (because they are
> > waiting for the end of the current window), and hence A times out on the
> > response from B and proceeds using some default values. This results in
> > incorrect computation at the end of the DAG.
> >
> > Also, please note that relaxing the fault tolerance feature is not a
> > requirement. I mentioned this because I assumed (perhaps incorrectly)
> that
> > this was the reason for incrementing windows in the delay operator.
> > However, as Vlad pointed out, there should be some point where an
> operator
> > could close the window.
> >
> > Any suggestions on how I can achieve the above mentioned use case?
> >
> > Thanks.
> > -Bhupesh
> >
> > On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com>
> > wrote:
> >
> > > Yes Partitioning A or B would be a good use case for iteration. Does
> > > iteration currently support partitioning though? I thought it didn't,
> > but I
> > > may be behind the times since I've been exiled from the office :).
> > >
> > > On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
> > > wrote:
> > >
> > > > What if B should be partitioned?
> > > >
> > > >
> > > >
> > > > On 2/4/16 17:43, Timothy Farkas wrote:
> > > >
> > > >> My question is why use iteration at all in such a case? You could
> just
> > > >> encapsulate A and B in a single single operator (call it OP) as
> > > >> components,
> > > >> and take the tuples output from B and put them to A. OP would also
> > > contain
> > > >> the logic to decide when to stop looping each tuple emitted by B
> back
> > to
> > > >> A.
> > > >>
> > > >> Thanks,
> > > >> Tim
> > > >>
> > > >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <
> v.rozov@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> IMO, it will be good to provide a little bit more details regarding
> > the
> > > >>> use case, namely what drives the requirement and why is it OK to
> > relax
> > > >>> the
> > > >>> fault tolerance feature. Another question is when will it be OK to
> > > close
> > > >>> the current window for the operator A? A can't close it as there
> may
> > be
> > > >>> more tuples coming from the input stream connected to the Delay
> > > operator
> > > >>> and Delay operator can't close it because A will not send
> END_WINDOW
> > > >>> waiting for END_WINDOW on the input port connected to the Delay
> > > operator.
> > > >>>
> > > >>> Vlad
> > > >>>
> > > >>>
> > > >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> > > >>>
> > > >>> Exactly. That is the requirement. Then the feedback can be utilized
> > for
> > > >>>> tuples in the same window rather than the tuples in the next
> window.
> > > >>>>
> > > >>>> -Bhupesh
> > > >>>>
> > > >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> > > >>>> sandeep@datatorrent.com
> > > >>>> wrote:
> > > >>>>
> > > >>>> Bhupesh: Do you mean to say that you would like to use Delay
> > Operator
> > > >>>> with
> > > >>>>
> > > >>>>> NO delay? Essentially you need feed back in real-time and not
> > delayed
> > > >>>>> by
> > > >>>>> a
> > > >>>>> window.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Sandeep
> > > >>>>>
> > > >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> > > >>>>> bhupesh@datatorrent.com
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>>> I am working on a dag which has a loop. As per my understanding,
> > > >>>>>> tuples
> > > >>>>>> which are flowing on the loop back stream, will arrive at the
> > > upstream
> > > >>>>>> operator in at least the next window.
> > > >>>>>>
> > > >>>>>> Here is an example:
> > > >>>>>>
> > > >>>>>> Source -> A -> B -> Delay -> A
> > > >>>>>>
> > > >>>>>> In the example above, tuples in window id X which arrive at B,
> > will
> > > be
> > > >>>>>>
> > > >>>>>> sent
> > > >>>>>
> > > >>>>> to A again in window id (X + n), where n >= 1.
> > > >>>>>> I understand this requirement is for the tuples to be recovered
> in
> > > >>>>>> case
> > > >>>>>>
> > > >>>>>> of
> > > >>>>>
> > > >>>>> a failure of operator B. However, is there a way I can allow the
> > > tuples
> > > >>>>>>
> > > >>>>>> to
> > > >>>>>
> > > >>>>> loop back in the same window, by relaxing the fault tolerance
> > > feature.
> > > >>>>>> In
> > > >>>>>> other words, I need tuples to immediately loop back and not wait
> > for
> > > >>>>>> the
> > > >>>>>> next window to arrive at operator A. I am okay if these tuples
> are
> > > not
> > > >>>>>> recovered in case of a failure.
> > > >>>>>>
> > > >>>>>> Thanks.
> > > >>>>>> -Bhupesh
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >
> > >
> >
>

Re: Regarding Iterations and Delay Operator

Posted by David Yan <da...@datatorrent.com>.
Hi Bhupesh,

In the use case you described, can you explain the reason why the feedback
tuples from B to A need to be in the same window?

David
On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bh...@datatorrent.com> wrote:

> Hi Vlad / Tim,
>
> This use case is part of the integration with Apache SAMOA. I am trying to
> run an algorithm which is written in SAMOA, on the Apex platform. This
> algorithm already requires two operators in the formation I described. This
> is the reason I cannot control how the operators are arranged in the DAG.
>
> To answer Vlad's question, here are some more details on the use case.
> Operator A processes tuples coming in from the source, processes them and
> passes them on to B. After few tuples, it may request B to compute some
> stats on the tuples and get it back (through the loop back stream), so that
> these stats could be used while processing the next few tuples coming from
> the source.
> In case of Apex, I am suspecting that these stats that B needs to send to A
> through the loop back stream, are arriving too late (because they are
> waiting for the end of the current window), and hence A times out on the
> response from B and proceeds using some default values. This results in
> incorrect computation at the end of the DAG.
>
> Also, please note that relaxing the fault tolerance feature is not a
> requirement. I mentioned this because I assumed (perhaps incorrectly) that
> this was the reason for incrementing windows in the delay operator.
> However, as Vlad pointed out, there should be some point where an operator
> could close the window.
>
> Any suggestions on how I can achieve the above mentioned use case?
>
> Thanks.
> -Bhupesh
>
> On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
>
> > Yes Partitioning A or B would be a good use case for iteration. Does
> > iteration currently support partitioning though? I thought it didn't,
> but I
> > may be behind the times since I've been exiled from the office :).
> >
> > On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
> > wrote:
> >
> > > What if B should be partitioned?
> > >
> > >
> > >
> > > On 2/4/16 17:43, Timothy Farkas wrote:
> > >
> > >> My question is why use iteration at all in such a case? You could just
> > >> encapsulate A and B in a single single operator (call it OP) as
> > >> components,
> > >> and take the tuples output from B and put them to A. OP would also
> > contain
> > >> the logic to decide when to stop looping each tuple emitted by B back
> to
> > >> A.
> > >>
> > >> Thanks,
> > >> Tim
> > >>
> > >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v....@datatorrent.com>
> > >> wrote:
> > >>
> > >> IMO, it will be good to provide a little bit more details regarding
> the
> > >>> use case, namely what drives the requirement and why is it OK to
> relax
> > >>> the
> > >>> fault tolerance feature. Another question is when will it be OK to
> > close
> > >>> the current window for the operator A? A can't close it as there may
> be
> > >>> more tuples coming from the input stream connected to the Delay
> > operator
> > >>> and Delay operator can't close it because A will not send END_WINDOW
> > >>> waiting for END_WINDOW on the input port connected to the Delay
> > operator.
> > >>>
> > >>> Vlad
> > >>>
> > >>>
> > >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> > >>>
> > >>> Exactly. That is the requirement. Then the feedback can be utilized
> for
> > >>>> tuples in the same window rather than the tuples in the next window.
> > >>>>
> > >>>> -Bhupesh
> > >>>>
> > >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> > >>>> sandeep@datatorrent.com
> > >>>> wrote:
> > >>>>
> > >>>> Bhupesh: Do you mean to say that you would like to use Delay
> Operator
> > >>>> with
> > >>>>
> > >>>>> NO delay? Essentially you need feed back in real-time and not
> delayed
> > >>>>> by
> > >>>>> a
> > >>>>> window.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Sandeep
> > >>>>>
> > >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> > >>>>> bhupesh@datatorrent.com
> > >>>>> wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>>> I am working on a dag which has a loop. As per my understanding,
> > >>>>>> tuples
> > >>>>>> which are flowing on the loop back stream, will arrive at the
> > upstream
> > >>>>>> operator in at least the next window.
> > >>>>>>
> > >>>>>> Here is an example:
> > >>>>>>
> > >>>>>> Source -> A -> B -> Delay -> A
> > >>>>>>
> > >>>>>> In the example above, tuples in window id X which arrive at B,
> will
> > be
> > >>>>>>
> > >>>>>> sent
> > >>>>>
> > >>>>> to A again in window id (X + n), where n >= 1.
> > >>>>>> I understand this requirement is for the tuples to be recovered in
> > >>>>>> case
> > >>>>>>
> > >>>>>> of
> > >>>>>
> > >>>>> a failure of operator B. However, is there a way I can allow the
> > tuples
> > >>>>>>
> > >>>>>> to
> > >>>>>
> > >>>>> loop back in the same window, by relaxing the fault tolerance
> > feature.
> > >>>>>> In
> > >>>>>> other words, I need tuples to immediately loop back and not wait
> for
> > >>>>>> the
> > >>>>>> next window to arrive at operator A. I am okay if these tuples are
> > not
> > >>>>>> recovered in case of a failure.
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>> -Bhupesh
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >
> >
>

Re: Regarding Iterations and Delay Operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi Vlad / Tim,

This use case is part of the integration with Apache SAMOA. I am trying to
run an algorithm which is written in SAMOA, on the Apex platform. This
algorithm already requires two operators in the formation I described. This
is the reason I cannot control how the operators are arranged in the DAG.

To answer Vlad's question, here are some more details on the use case.
Operator A processes tuples coming in from the source, processes them and
passes them on to B. After few tuples, it may request B to compute some
stats on the tuples and get it back (through the loop back stream), so that
these stats could be used while processing the next few tuples coming from
the source.
In case of Apex, I am suspecting that these stats that B needs to send to A
through the loop back stream, are arriving too late (because they are
waiting for the end of the current window), and hence A times out on the
response from B and proceeds using some default values. This results in
incorrect computation at the end of the DAG.

Also, please note that relaxing the fault tolerance feature is not a
requirement. I mentioned this because I assumed (perhaps incorrectly) that
this was the reason for incrementing windows in the delay operator.
However, as Vlad pointed out, there should be some point where an operator
could close the window.

Any suggestions on how I can achieve the above mentioned use case?

Thanks.
-Bhupesh

On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <ti...@datatorrent.com> wrote:

> Yes Partitioning A or B would be a good use case for iteration. Does
> iteration currently support partitioning though? I thought it didn't, but I
> may be behind the times since I've been exiled from the office :).
>
> On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com>
> wrote:
>
> > What if B should be partitioned?
> >
> >
> >
> > On 2/4/16 17:43, Timothy Farkas wrote:
> >
> >> My question is why use iteration at all in such a case? You could just
> >> encapsulate A and B in a single single operator (call it OP) as
> >> components,
> >> and take the tuples output from B and put them to A. OP would also
> contain
> >> the logic to decide when to stop looping each tuple emitted by B back to
> >> A.
> >>
> >> Thanks,
> >> Tim
> >>
> >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v....@datatorrent.com>
> >> wrote:
> >>
> >> IMO, it will be good to provide a little bit more details regarding the
> >>> use case, namely what drives the requirement and why is it OK to relax
> >>> the
> >>> fault tolerance feature. Another question is when will it be OK to
> close
> >>> the current window for the operator A? A can't close it as there may be
> >>> more tuples coming from the input stream connected to the Delay
> operator
> >>> and Delay operator can't close it because A will not send END_WINDOW
> >>> waiting for END_WINDOW on the input port connected to the Delay
> operator.
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> >>>
> >>> Exactly. That is the requirement. Then the feedback can be utilized for
> >>>> tuples in the same window rather than the tuples in the next window.
> >>>>
> >>>> -Bhupesh
> >>>>
> >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> >>>> sandeep@datatorrent.com
> >>>> wrote:
> >>>>
> >>>> Bhupesh: Do you mean to say that you would like to use Delay Operator
> >>>> with
> >>>>
> >>>>> NO delay? Essentially you need feed back in real-time and not delayed
> >>>>> by
> >>>>> a
> >>>>> window.
> >>>>>
> >>>>> Regards,
> >>>>> Sandeep
> >>>>>
> >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> >>>>> bhupesh@datatorrent.com
> >>>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>> I am working on a dag which has a loop. As per my understanding,
> >>>>>> tuples
> >>>>>> which are flowing on the loop back stream, will arrive at the
> upstream
> >>>>>> operator in at least the next window.
> >>>>>>
> >>>>>> Here is an example:
> >>>>>>
> >>>>>> Source -> A -> B -> Delay -> A
> >>>>>>
> >>>>>> In the example above, tuples in window id X which arrive at B, will
> be
> >>>>>>
> >>>>>> sent
> >>>>>
> >>>>> to A again in window id (X + n), where n >= 1.
> >>>>>> I understand this requirement is for the tuples to be recovered in
> >>>>>> case
> >>>>>>
> >>>>>> of
> >>>>>
> >>>>> a failure of operator B. However, is there a way I can allow the
> tuples
> >>>>>>
> >>>>>> to
> >>>>>
> >>>>> loop back in the same window, by relaxing the fault tolerance
> feature.
> >>>>>> In
> >>>>>> other words, I need tuples to immediately loop back and not wait for
> >>>>>> the
> >>>>>> next window to arrive at operator A. I am okay if these tuples are
> not
> >>>>>> recovered in case of a failure.
> >>>>>>
> >>>>>> Thanks.
> >>>>>> -Bhupesh
> >>>>>>
> >>>>>>
> >>>>>>
> >
>

Re: Regarding Iterations and Delay Operator

Posted by Timothy Farkas <ti...@datatorrent.com>.
Yes Partitioning A or B would be a good use case for iteration. Does
iteration currently support partitioning though? I thought it didn't, but I
may be behind the times since I've been exiled from the office :).

On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v....@datatorrent.com> wrote:

> What if B should be partitioned?
>
>
>
> On 2/4/16 17:43, Timothy Farkas wrote:
>
>> My question is why use iteration at all in such a case? You could just
>> encapsulate A and B in a single single operator (call it OP) as
>> components,
>> and take the tuples output from B and put them to A. OP would also contain
>> the logic to decide when to stop looping each tuple emitted by B back to
>> A.
>>
>> Thanks,
>> Tim
>>
>> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>>
>> IMO, it will be good to provide a little bit more details regarding the
>>> use case, namely what drives the requirement and why is it OK to relax
>>> the
>>> fault tolerance feature. Another question is when will it be OK to close
>>> the current window for the operator A? A can't close it as there may be
>>> more tuples coming from the input stream connected to the Delay operator
>>> and Delay operator can't close it because A will not send END_WINDOW
>>> waiting for END_WINDOW on the input port connected to the Delay operator.
>>>
>>> Vlad
>>>
>>>
>>> On 2/4/16 01:04, Bhupesh Chawda wrote:
>>>
>>> Exactly. That is the requirement. Then the feedback can be utilized for
>>>> tuples in the same window rather than the tuples in the next window.
>>>>
>>>> -Bhupesh
>>>>
>>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
>>>> sandeep@datatorrent.com
>>>> wrote:
>>>>
>>>> Bhupesh: Do you mean to say that you would like to use Delay Operator
>>>> with
>>>>
>>>>> NO delay? Essentially you need feed back in real-time and not delayed
>>>>> by
>>>>> a
>>>>> window.
>>>>>
>>>>> Regards,
>>>>> Sandeep
>>>>>
>>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
>>>>> bhupesh@datatorrent.com
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> I am working on a dag which has a loop. As per my understanding,
>>>>>> tuples
>>>>>> which are flowing on the loop back stream, will arrive at the upstream
>>>>>> operator in at least the next window.
>>>>>>
>>>>>> Here is an example:
>>>>>>
>>>>>> Source -> A -> B -> Delay -> A
>>>>>>
>>>>>> In the example above, tuples in window id X which arrive at B, will be
>>>>>>
>>>>>> sent
>>>>>
>>>>> to A again in window id (X + n), where n >= 1.
>>>>>> I understand this requirement is for the tuples to be recovered in
>>>>>> case
>>>>>>
>>>>>> of
>>>>>
>>>>> a failure of operator B. However, is there a way I can allow the tuples
>>>>>>
>>>>>> to
>>>>>
>>>>> loop back in the same window, by relaxing the fault tolerance feature.
>>>>>> In
>>>>>> other words, I need tuples to immediately loop back and not wait for
>>>>>> the
>>>>>> next window to arrive at operator A. I am okay if these tuples are not
>>>>>> recovered in case of a failure.
>>>>>>
>>>>>> Thanks.
>>>>>> -Bhupesh
>>>>>>
>>>>>>
>>>>>>
>

Re: Regarding Iterations and Delay Operator

Posted by Vlad Rozov <v....@datatorrent.com>.
What if B should be partitioned?


On 2/4/16 17:43, Timothy Farkas wrote:
> My question is why use iteration at all in such a case? You could just
> encapsulate A and B in a single single operator (call it OP) as components,
> and take the tuples output from B and put them to A. OP would also contain
> the logic to decide when to stop looping each tuple emitted by B back to A.
>
> Thanks,
> Tim
>
> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v....@datatorrent.com> wrote:
>
>> IMO, it will be good to provide a little bit more details regarding the
>> use case, namely what drives the requirement and why is it OK to relax the
>> fault tolerance feature. Another question is when will it be OK to close
>> the current window for the operator A? A can't close it as there may be
>> more tuples coming from the input stream connected to the Delay operator
>> and Delay operator can't close it because A will not send END_WINDOW
>> waiting for END_WINDOW on the input port connected to the Delay operator.
>>
>> Vlad
>>
>>
>> On 2/4/16 01:04, Bhupesh Chawda wrote:
>>
>>> Exactly. That is the requirement. Then the feedback can be utilized for
>>> tuples in the same window rather than the tuples in the next window.
>>>
>>> -Bhupesh
>>>
>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <sandeep@datatorrent.com
>>> wrote:
>>>
>>> Bhupesh: Do you mean to say that you would like to use Delay Operator with
>>>> NO delay? Essentially you need feed back in real-time and not delayed by
>>>> a
>>>> window.
>>>>
>>>> Regards,
>>>> Sandeep
>>>>
>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <bhupesh@datatorrent.com
>>>> wrote:
>>>>
>>>> Hi,
>>>>> I am working on a dag which has a loop. As per my understanding, tuples
>>>>> which are flowing on the loop back stream, will arrive at the upstream
>>>>> operator in at least the next window.
>>>>>
>>>>> Here is an example:
>>>>>
>>>>> Source -> A -> B -> Delay -> A
>>>>>
>>>>> In the example above, tuples in window id X which arrive at B, will be
>>>>>
>>>> sent
>>>>
>>>>> to A again in window id (X + n), where n >= 1.
>>>>> I understand this requirement is for the tuples to be recovered in case
>>>>>
>>>> of
>>>>
>>>>> a failure of operator B. However, is there a way I can allow the tuples
>>>>>
>>>> to
>>>>
>>>>> loop back in the same window, by relaxing the fault tolerance feature.
>>>>> In
>>>>> other words, I need tuples to immediately loop back and not wait for the
>>>>> next window to arrive at operator A. I am okay if these tuples are not
>>>>> recovered in case of a failure.
>>>>>
>>>>> Thanks.
>>>>> -Bhupesh
>>>>>
>>>>>


Re: Regarding Iterations and Delay Operator

Posted by Timothy Farkas <ti...@datatorrent.com>.
My question is why use iteration at all in such a case? You could just
encapsulate A and B in a single single operator (call it OP) as components,
and take the tuples output from B and put them to A. OP would also contain
the logic to decide when to stop looping each tuple emitted by B back to A.

Thanks,
Tim

On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v....@datatorrent.com> wrote:

> IMO, it will be good to provide a little bit more details regarding the
> use case, namely what drives the requirement and why is it OK to relax the
> fault tolerance feature. Another question is when will it be OK to close
> the current window for the operator A? A can't close it as there may be
> more tuples coming from the input stream connected to the Delay operator
> and Delay operator can't close it because A will not send END_WINDOW
> waiting for END_WINDOW on the input port connected to the Delay operator.
>
> Vlad
>
>
> On 2/4/16 01:04, Bhupesh Chawda wrote:
>
>> Exactly. That is the requirement. Then the feedback can be utilized for
>> tuples in the same window rather than the tuples in the next window.
>>
>> -Bhupesh
>>
>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <sandeep@datatorrent.com
>> >
>> wrote:
>>
>> Bhupesh: Do you mean to say that you would like to use Delay Operator with
>>> NO delay? Essentially you need feed back in real-time and not delayed by
>>> a
>>> window.
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I am working on a dag which has a loop. As per my understanding, tuples
>>>> which are flowing on the loop back stream, will arrive at the upstream
>>>> operator in at least the next window.
>>>>
>>>> Here is an example:
>>>>
>>>> Source -> A -> B -> Delay -> A
>>>>
>>>> In the example above, tuples in window id X which arrive at B, will be
>>>>
>>> sent
>>>
>>>> to A again in window id (X + n), where n >= 1.
>>>> I understand this requirement is for the tuples to be recovered in case
>>>>
>>> of
>>>
>>>> a failure of operator B. However, is there a way I can allow the tuples
>>>>
>>> to
>>>
>>>> loop back in the same window, by relaxing the fault tolerance feature.
>>>> In
>>>> other words, I need tuples to immediately loop back and not wait for the
>>>> next window to arrive at operator A. I am okay if these tuples are not
>>>> recovered in case of a failure.
>>>>
>>>> Thanks.
>>>> -Bhupesh
>>>>
>>>>
>

Re: Regarding Iterations and Delay Operator

Posted by Vlad Rozov <v....@datatorrent.com>.
IMO, it will be good to provide a little bit more details regarding the 
use case, namely what drives the requirement and why is it OK to relax 
the fault tolerance feature. Another question is when will it be OK to 
close the current window for the operator A? A can't close it as there 
may be more tuples coming from the input stream connected to the Delay 
operator and Delay operator can't close it because A will not send 
END_WINDOW waiting for END_WINDOW on the input port connected to the 
Delay operator.

Vlad

On 2/4/16 01:04, Bhupesh Chawda wrote:
> Exactly. That is the requirement. Then the feedback can be utilized for
> tuples in the same window rather than the tuples in the next window.
>
> -Bhupesh
>
> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <sa...@datatorrent.com>
> wrote:
>
>> Bhupesh: Do you mean to say that you would like to use Delay Operator with
>> NO delay? Essentially you need feed back in real-time and not delayed by a
>> window.
>>
>> Regards,
>> Sandeep
>>
>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am working on a dag which has a loop. As per my understanding, tuples
>>> which are flowing on the loop back stream, will arrive at the upstream
>>> operator in at least the next window.
>>>
>>> Here is an example:
>>>
>>> Source -> A -> B -> Delay -> A
>>>
>>> In the example above, tuples in window id X which arrive at B, will be
>> sent
>>> to A again in window id (X + n), where n >= 1.
>>> I understand this requirement is for the tuples to be recovered in case
>> of
>>> a failure of operator B. However, is there a way I can allow the tuples
>> to
>>> loop back in the same window, by relaxing the fault tolerance feature. In
>>> other words, I need tuples to immediately loop back and not wait for the
>>> next window to arrive at operator A. I am okay if these tuples are not
>>> recovered in case of a failure.
>>>
>>> Thanks.
>>> -Bhupesh
>>>


Re: Regarding Iterations and Delay Operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Exactly. That is the requirement. Then the feedback can be utilized for
tuples in the same window rather than the tuples in the next window.

-Bhupesh

On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Bhupesh: Do you mean to say that you would like to use Delay Operator with
> NO delay? Essentially you need feed back in real-time and not delayed by a
> window.
>
> Regards,
> Sandeep
>
> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi,
> >
> > I am working on a dag which has a loop. As per my understanding, tuples
> > which are flowing on the loop back stream, will arrive at the upstream
> > operator in at least the next window.
> >
> > Here is an example:
> >
> > Source -> A -> B -> Delay -> A
> >
> > In the example above, tuples in window id X which arrive at B, will be
> sent
> > to A again in window id (X + n), where n >= 1.
> > I understand this requirement is for the tuples to be recovered in case
> of
> > a failure of operator B. However, is there a way I can allow the tuples
> to
> > loop back in the same window, by relaxing the fault tolerance feature. In
> > other words, I need tuples to immediately loop back and not wait for the
> > next window to arrive at operator A. I am okay if these tuples are not
> > recovered in case of a failure.
> >
> > Thanks.
> > -Bhupesh
> >
>

Re: Regarding Iterations and Delay Operator

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
Bhupesh: Do you mean to say that you would like to use Delay Operator with
NO delay? Essentially you need feed back in real-time and not delayed by a
window.

Regards,
Sandeep

On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi,
>
> I am working on a dag which has a loop. As per my understanding, tuples
> which are flowing on the loop back stream, will arrive at the upstream
> operator in at least the next window.
>
> Here is an example:
>
> Source -> A -> B -> Delay -> A
>
> In the example above, tuples in window id X which arrive at B, will be sent
> to A again in window id (X + n), where n >= 1.
> I understand this requirement is for the tuples to be recovered in case of
> a failure of operator B. However, is there a way I can allow the tuples to
> loop back in the same window, by relaxing the fault tolerance feature. In
> other words, I need tuples to immediately loop back and not wait for the
> next window to arrive at operator A. I am okay if these tuples are not
> recovered in case of a failure.
>
> Thanks.
> -Bhupesh
>