You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Pramod Immaneni <pr...@datatorrent.com> on 2016/02/02 18:13:49 UTC

proposal to change names of processing modes

Today we support three different processing modes for operators, "at least
once", "at most once" and "exactly once" which determine tuple processing
and recovery behavior when there is operator recovery from failure. The
default being at least once where the tuples are replayed from the
recovered checkpoint.

At least once works well for most applications. Typically applications
persist the final output of processing through the DAG into various outputs
like key value stores, databases or even HDFS files. In many of these cases
various strategies can be employed to save the data "exactly once" in the
output, such as transactions, rewinding, meta data storage, idempotent
operations etc. Furthermore the exactly once processing mode, which is a
checkpoint performed every window is rarely used. All this leads to
confusion especially to somebody new and also makes it difficult to explain
these names to less technical audience in meetups and public forums.

What I am proposing is only a name change which will make this more
intuitive to understand. Something simple like "repeat" for "at least
once", "latest" for "at most once" and "repeat latest" for "exactly once"
can do the trick.

Thanks

Re: proposal to change names of processing modes

Posted by Gaurav Gupta <ga...@gmail.com>.
+1 for Vlad's suggestion. Use the same notation that is used by other
streaming platforms instead of inventing new ones.

On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com> wrote:

> I vote to keep original names and educate/explain their meaning to non
> technical audience as delivery guarantee is not specific to Apex, but has
> common meaning for all streaming platforms.
>
> Vlad
>
>
> On 2/2/16 15:17, Timothy Farkas wrote:
>
>> Could we provide Processing and Output Centric Aliases for the
>> ProcessingModes?
>>
>> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
>>
>> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
>> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
>>
>> Tim
>>
>> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>> Well output guarantees are managed by the operators themselves so the user
>>> will typically not see that as part of the engine features, they only see
>>> processing guarantees and while they are technically correct as far as
>>> individual operators are concerned the names give a different idea.
>>>
>>> Thanks
>>>
>>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
>>> wrote:
>>>
>>> I think I understand the ambiguity you are trying to clear up Pramod.
>>>> Perhaps it can be disambiguated by distinguishing between Processing
>>>> Guarantees and Output Guarantees, when explaining to people. Processing
>>>> Guarantees apply to the way tuples are transmitted between operators.
>>>> Output Guarantees apply to the way output operators write tuples to a
>>>>
>>> Data
>>>
>>>> Sink.
>>>>
>>>> This way we can describe each term intuitively in each context:
>>>>
>>>> At Most Once: A tuple can be dropped or transmitted (written) only once.
>>>> At Least Once: A tuple can be transmitted (written) one or more times.
>>>> Exactly Once: A tuple is transmitted (written) only once.
>>>>
>>>> Then we could provide a table with the strongest Output Guarantee that
>>>> is
>>>> possible for each Processing Guarantee.
>>>>
>>>> Processing          |   Strongest Output Guarantee
>>>> ----------------------------------------------
>>>> At Most Once      | At Most Once
>>>> At Least Once     | Exactly Once
>>>> Exactly Once      |  Exactly Once
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
>>>> wrote:
>>>>
>>>> I agree with Tim. Instead of new terminologies, better explanation for
>>>>>
>>>> the
>>>>
>>>>> existing once are more useful.
>>>>>
>>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com
>>>>> wrote:
>>>>>
>>>>> The idea is to disambiguate without using at least once since exactly
>>>>>>
>>>>> once
>>>>>
>>>>>> output can still be achieved with those. Any other names are fine,
>>>>>>
>>>>> those
>>>>
>>>>> were just suggestions.
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> The new names don't make as much sense to me as the original names.
>>>>>>>
>>>>>> The
>>>>
>>>>> concepts require some thought to understand, and it won't
>>>>>>>
>>>>>> necessarily
>>>
>>>> be
>>>>>
>>>>>> made easier with a name change. I think a better way to attack
>>>>>>> misunderstandings is to clearly explain what a window, operator,
>>>>>>>
>>>>>> input
>>>>
>>>>> operator, output operator, tuple, checkpoint, and DAG is with
>>>>>>>
>>>>>> really
>>>
>>>> clean
>>>>>>
>>>>>>> and simple illustrations of the concepts. Then we can explain more
>>>>>>>
>>>>>> involved
>>>>>>
>>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
>>>>>>>
>>>>>> well
>>>
>>>> thought illustrations. Without a clear explanation of the basic
>>>>>>>
>>>>>> vocabulary,
>>>>>>
>>>>>>> and without pictures, it is difficult to get even technical people
>>>>>>>
>>>>>> to
>>>
>>>> understand these concepts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>>>>>
>>>>>> pramod@datatorrent.com>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Today we support three different processing modes for operators,
>>>>>>>>
>>>>>>> "at
>>>>
>>>>> least
>>>>>>>
>>>>>>>> once", "at most once" and "exactly once" which determine tuple
>>>>>>>>
>>>>>>> processing
>>>>>>
>>>>>>> and recovery behavior when there is operator recovery from
>>>>>>>>
>>>>>>> failure.
>>>
>>>> The
>>>>>
>>>>>> default being at least once where the tuples are replayed from
>>>>>>>>
>>>>>>> the
>>>
>>>> recovered checkpoint.
>>>>>>>>
>>>>>>>> At least once works well for most applications. Typically
>>>>>>>>
>>>>>>> applications
>>>>>
>>>>>> persist the final output of processing through the DAG into
>>>>>>>>
>>>>>>> various
>>>
>>>> outputs
>>>>>>>
>>>>>>>> like key value stores, databases or even HDFS files. In many of
>>>>>>>>
>>>>>>> these
>>>>
>>>>> cases
>>>>>>>
>>>>>>>> various strategies can be employed to save the data "exactly
>>>>>>>>
>>>>>>> once"
>>>
>>>> in
>>>>
>>>>> the
>>>>>>
>>>>>>> output, such as transactions, rewinding, meta data storage,
>>>>>>>>
>>>>>>> idempotent
>>>>>
>>>>>> operations etc. Furthermore the exactly once processing mode,
>>>>>>>>
>>>>>>> which
>>>
>>>> is
>>>>>
>>>>>> a
>>>>>>
>>>>>>> checkpoint performed every window is rarely used. All this leads
>>>>>>>>
>>>>>>> to
>>>
>>>> confusion especially to somebody new and also makes it difficult
>>>>>>>>
>>>>>>> to
>>>
>>>> explain
>>>>>>>
>>>>>>>> these names to less technical audience in meetups and public
>>>>>>>>
>>>>>>> forums.
>>>>
>>>>> What I am proposing is only a name change which will make this
>>>>>>>>
>>>>>>> more
>>>
>>>> intuitive to understand. Something simple like "repeat" for "at
>>>>>>>>
>>>>>>> least
>>>>
>>>>> once", "latest" for "at most once" and "repeat latest" for
>>>>>>>>
>>>>>>> "exactly
>>>
>>>> once"
>>>>>>
>>>>>>> can do the trick.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>

Re: proposal to change names of processing modes

Posted by Pramod Immaneni <pr...@datatorrent.com>.
I am in the process of writing a technical blog on this topic.

Thanks

> On Feb 3, 2016, at 8:14 AM, Amol Kekre <am...@datatorrent.com> wrote:
>
> Agreed on sticking to standard terminology and explaining details. A deep
> technical blog plus a section on this topic in Apex doc would work.
>
> Thks,
> Amol
>
>
> On Tue, Feb 2, 2016 at 10:51 PM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
>> We should stick with standard terminology but make sure the differences are
>> well explained. That's necessary because other platforms use the same words
>> with different meaning, compare Storm, Spark Streaming and Flink.
>>
>> Take "exactly once" as example. Elsewhere you will find it claimed when it
>> really is "at least once". Events are replayed and computation repeated.
>> When all operations in the overall system are idempotent, then it is
>> possible to avoid effects such as double counting, duplicate web service
>> calls or rows in the database etc. Hence, the engine cannot claim to
>> support "exactly once", this is only valid when operators used in the
>> application collectively support it.
>>
>> In Apex, the engine provides the hooks (endWindow, committed) to achieve
>> idempotency in operators that have an effect on external systems. There are
>> several implementations of operators that can be used with at-least-once
>> processing mode that will deliver "exactly-once" for the application when
>> all operations in the DAG are idempotent.
>>
>>
>>
>>
>>
>> On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <sh...@datatorrent.com>
>> wrote:
>>
>>> +1 for adding detailed explanation about the concepts in tutorials.
>>>
>>>
>>> On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
>>> chinmay@datatorrent.com>
>>> wrote:
>>>
>>>> +1 for Vlad's suggestion. Searching for keywords like "at least once",
>>> "at
>>>> most once" and "exactly once" tells that these terminologies are are
>>> widely
>>>> popular where semantics are defined for tuple processing.
>>>> Adding example applications for each of them would help in educating
>> the
>>>> terminologies in Apex context.
>>>>
>>>> On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
>>> chanchal.apexrtx@gmail.com
>>>> wrote:
>>>>
>>>>> I do agree with Vlad. it will be good to have good explanation with
>>>> example
>>>>> for existing names as it will be not create confusion for those who
>>>> already
>>>>> knows it and also for those who are beginners.
>>>>>
>>>>> On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com>
>>> wrote:
>>>>>
>>>>>> I agree with Vlad too.
>>>>>>
>>>>>> Thks
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
>>> ram@datatorrent.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> I agree with Vlad: these names are so deeply embedded in the
>>>> community
>>>>>> that
>>>>>>> changing them is likely
>>>>>>> to create more problems than it solves.
>>>>>>>
>>>>>>> Ram
>>>>>>>
>>>>>>> On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
>>> v.rozov@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I vote to keep original names and educate/explain their meaning
>>> to
>>>>> non
>>>>>>>> technical audience as delivery guarantee is not specific to
>> Apex,
>>>> but
>>>>>> has
>>>>>>>> common meaning for all streaming platforms.
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 2/2/16 15:17, Timothy Farkas wrote:
>>>>>>>>>
>>>>>>>>> Could we provide Processing and Output Centric Aliases for the
>>>>>>>>> ProcessingModes?
>>>>>>>>>
>>>>>>>>> ProcessingMode.AT_MOST_ONCE_OUTPUT =
>> ProcessingMode.AT_MOST_ONCE
>>>>>>>>> ProcessingMode.EXACTLY_ONCE_OUTPUT =
>>> ProcessingMode.AT_LEAST_ONCE
>>>>>>>>>
>>>>>>>>> ProcessingMode.AT_MOST_ONCE_PROCESSING =
>>>> ProcessingMode.AT_MOST_ONCE
>>>>>>>>> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
>>>>> ProcessingMode.AT_LEAST_ONCE
>>>>>>>>> ProcessingMode.EXACTLY_ONCE_PROCESSING =
>>>> ProcessingMode.EXACTLY_ONCE
>>>>>>>>>
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
>>>>>> pramod@datatorrent.com
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Well output guarantees are managed by the operators themselves
>>> so
>>>>> the
>>>>>>> user
>>>>>>>>>> will typically not see that as part of the engine features,
>>> they
>>>>> only
>>>>>>> see
>>>>>>>>>> processing guarantees and while they are technically correct
>> as
>>>> far
>>>>>> as
>>>>>>>>>> individual operators are concerned the names give a different
>>>> idea.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
>>>>> tim@datatorrent.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I think I understand the ambiguity you are trying to clear up
>>>>> Pramod.
>>>>>>>>>>> Perhaps it can be disambiguated by distinguishing between
>>>>> Processing
>>>>>>>>>>> Guarantees and Output Guarantees, when explaining to people.
>>>>>>> Processing
>>>>>>>>>>> Guarantees apply to the way tuples are transmitted between
>>>>>> operators.
>>>>>>>>>>> Output Guarantees apply to the way output operators write
>>> tuples
>>>>> to
>>>>>> a
>>>>>>>>>> Data
>>>>>>>>>>
>>>>>>>>>>> Sink.
>>>>>>>>>>>
>>>>>>>>>>> This way we can describe each term intuitively in each
>>> context:
>>>>>>>>>>>
>>>>>>>>>>> At Most Once: A tuple can be dropped or transmitted
>> (written)
>>>> only
>>>>>>> once.
>>>>>>>>>>> At Least Once: A tuple can be transmitted (written) one or
>>> more
>>>>>> times.
>>>>>>>>>>> Exactly Once: A tuple is transmitted (written) only once.
>>>>>>>>>>>
>>>>>>>>>>> Then we could provide a table with the strongest Output
>>>> Guarantee
>>>>>> that
>>>>>>>>>>> is
>>>>>>>>>>> possible for each Processing Guarantee.
>>>>>>>>>>>
>>>>>>>>>>> Processing          |   Strongest Output Guarantee
>>>>>>>>>>> ----------------------------------------------
>>>>>>>>>>> At Most Once      | At Most Once
>>>>>>>>>>> At Least Once     | Exactly Once
>>>>>>>>>>> Exactly Once      |  Exactly Once
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
>>>>>>> sandesh@datatorrent.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I agree with Tim. Instead of new terminologies, better
>>>> explanation
>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>> existing once are more useful.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
>>>>>>> pramod@datatorrent.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to disambiguate without using at least once
>> since
>>>>>> exactly
>>>>>>>>>>>> once
>>>>>>>>>>>>
>>>>>>>>>>>>> output can still be achieved with those. Any other names
>> are
>>>>> fine,
>>>>>>>>>>>> those
>>>>>>>>>>>
>>>>>>>>>>>> were just suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
>>>>>> tim@datatorrent.com
>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The new names don't make as much sense to me as the
>> original
>>>>>> names.
>>>>>>>>>>>>> The
>>>>>>>>>>>
>>>>>>>>>>>> concepts require some thought to understand, and it won't
>>>>>>>>>>>>> necessarily
>>>>>>>>>>
>>>>>>>>>>> be
>>>>>>>>>>>>
>>>>>>>>>>>>> made easier with a name change. I think a better way to
>>> attack
>>>>>>>>>>>>>> misunderstandings is to clearly explain what a window,
>>>>> operator,
>>>>>>>>>>>>> input
>>>>>>>>>>>
>>>>>>>>>>>> operator, output operator, tuple, checkpoint, and DAG is
>> with
>>>>>>>>>>>>> really
>>>>>>>>>>
>>>>>>>>>>> clean
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and simple illustrations of the concepts. Then we can
>>> explain
>>>>>> more
>>>>>>>>>>>>> involved
>>>>>>>>>>>>>
>>>>>>>>>>>>>> concepts like At Least Once, At Most Once, and Exactly
>> Once
>>>>> with
>>>>>>>>>>>>> well
>>>>>>>>>>
>>>>>>>>>>> thought illustrations. Without a clear explanation of the
>>> basic
>>>>>>>>>>>>> vocabulary,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and without pictures, it is difficult to get even
>> technical
>>>>>> people
>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> understand these concepts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>>>>>>>>>>> pramod@datatorrent.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Today we support three different processing modes for
>>>>> operators,
>>>>>>>>>>>>>> "at
>>>>>>>>>>>
>>>>>>>>>>>> least
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> once", "at most once" and "exactly once" which determine
>>>> tuple
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and recovery behavior when there is operator recovery
>> from
>>>>>>>>>>>>>> failure.
>>>>>>>>>>
>>>>>>>>>>> The
>>>>>>>>>>>>
>>>>>>>>>>>>> default being at least once where the tuples are replayed
>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> recovered checkpoint.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At least once works well for most applications.
>> Typically
>>>>>>>>>>>>>> applications
>>>>>>>>>>>>
>>>>>>>>>>>>> persist the final output of processing through the DAG
>> into
>>>>>>>>>>>>>> various
>>>>>>>>>>
>>>>>>>>>>> outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> like key value stores, databases or even HDFS files. In
>>> many
>>>>> of
>>>>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>>> cases
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> various strategies can be employed to save the data
>>> "exactly
>>>>>>>>>>>>>> once"
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>>> output, such as transactions, rewinding, meta data
>> storage,
>>>>>>>>>>>>>> idempotent
>>>>>>>>>>>>
>>>>>>>>>>>>> operations etc. Furthermore the exactly once processing
>>> mode,
>>>>>>>>>>>>>> which
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>> a
>>>>>>>>>>>>>
>>>>>>>>>>>>>> checkpoint performed every window is rarely used. All
>> this
>>>>> leads
>>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> confusion especially to somebody new and also makes it
>>> difficult
>>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> explain
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> these names to less technical audience in meetups and
>>> public
>>>>>>>>>>>>>> forums.
>>>>>>>>>>>
>>>>>>>>>>>> What I am proposing is only a name change which will make
>>> this
>>>>>>>>>>>>>> more
>>>>>>>>>>
>>>>>>>>>>> intuitive to understand. Something simple like "repeat" for
>>> "at
>>>>>>>>>>>>>> least
>>>>>>>>>>>
>>>>>>>>>>>> once", "latest" for "at most once" and "repeat latest" for
>>>>>>>>>>>>>> "exactly
>>>>>>>>>>
>>>>>>>>>>> once"
>>>>>>>>>>>>>
>>>>>>>>>>>>>> can do the trick.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>

Re: proposal to change names of processing modes

Posted by Amol Kekre <am...@datatorrent.com>.
Agreed on sticking to standard terminology and explaining details. A deep
technical blog plus a section on this topic in Apex doc would work.

Thks,
Amol


On Tue, Feb 2, 2016 at 10:51 PM, Thomas Weise <th...@datatorrent.com>
wrote:

> We should stick with standard terminology but make sure the differences are
> well explained. That's necessary because other platforms use the same words
> with different meaning, compare Storm, Spark Streaming and Flink.
>
> Take "exactly once" as example. Elsewhere you will find it claimed when it
> really is "at least once". Events are replayed and computation repeated.
> When all operations in the overall system are idempotent, then it is
> possible to avoid effects such as double counting, duplicate web service
> calls or rows in the database etc. Hence, the engine cannot claim to
> support "exactly once", this is only valid when operators used in the
> application collectively support it.
>
> In Apex, the engine provides the hooks (endWindow, committed) to achieve
> idempotency in operators that have an effect on external systems. There are
> several implementations of operators that can be used with at-least-once
> processing mode that will deliver "exactly-once" for the application when
> all operations in the DAG are idempotent.
>
>
>
>
>
> On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <sh...@datatorrent.com>
> wrote:
>
> > +1 for adding detailed explanation about the concepts in tutorials.
> >
> >
> > On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
> > chinmay@datatorrent.com>
> > wrote:
> >
> > > +1 for Vlad's suggestion. Searching for keywords like "at least once",
> > "at
> > > most once" and "exactly once" tells that these terminologies are are
> > widely
> > > popular where semantics are defined for tuple processing.
> > > Adding example applications for each of them would help in educating
> the
> > > terminologies in Apex context.
> > >
> > > On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
> > chanchal.apexrtx@gmail.com
> > > >
> > > wrote:
> > >
> > > > I do agree with Vlad. it will be good to have good explanation with
> > > example
> > > > for existing names as it will be not create confusion for those who
> > > already
> > > > knows it and also for those who are beginners.
> > > >
> > > > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com>
> > wrote:
> > > >
> > > > > I agree with Vlad too.
> > > > >
> > > > > Thks
> > > > > Amol
> > > > >
> > > > >
> > > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
> > ram@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I agree with Vlad: these names are so deeply embedded in the
> > > community
> > > > > that
> > > > > > changing them is likely
> > > > > > to create more problems than it solves.
> > > > > >
> > > > > > Ram
> > > > > >
> > > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
> > v.rozov@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I vote to keep original names and educate/explain their meaning
> > to
> > > > non
> > > > > > > technical audience as delivery guarantee is not specific to
> Apex,
> > > but
> > > > > has
> > > > > > > common meaning for all streaming platforms.
> > > > > > >
> > > > > > > Vlad
> > > > > > >
> > > > > > >
> > > > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > > > >
> > > > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > > > >> ProcessingModes?
> > > > > > >>
> > > > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT =
> ProcessingMode.AT_MOST_ONCE
> > > > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT =
> > ProcessingMode.AT_LEAST_ONCE
> > > > > > >>
> > > > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> > > ProcessingMode.AT_MOST_ONCE
> > > > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > > > ProcessingMode.AT_LEAST_ONCE
> > > > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> > > ProcessingMode.EXACTLY_ONCE
> > > > > > >>
> > > > > > >> Tim
> > > > > > >>
> > > > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > > > pramod@datatorrent.com
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> Well output guarantees are managed by the operators themselves
> > so
> > > > the
> > > > > > user
> > > > > > >>> will typically not see that as part of the engine features,
> > they
> > > > only
> > > > > > see
> > > > > > >>> processing guarantees and while they are technically correct
> as
> > > far
> > > > > as
> > > > > > >>> individual operators are concerned the names give a different
> > > idea.
> > > > > > >>>
> > > > > > >>> Thanks
> > > > > > >>>
> > > > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > > > tim@datatorrent.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>> I think I understand the ambiguity you are trying to clear up
> > > > Pramod.
> > > > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > > > Processing
> > > > > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > > > > Processing
> > > > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > > > operators.
> > > > > > >>>> Output Guarantees apply to the way output operators write
> > tuples
> > > > to
> > > > > a
> > > > > > >>>>
> > > > > > >>> Data
> > > > > > >>>
> > > > > > >>>> Sink.
> > > > > > >>>>
> > > > > > >>>> This way we can describe each term intuitively in each
> > context:
> > > > > > >>>>
> > > > > > >>>> At Most Once: A tuple can be dropped or transmitted
> (written)
> > > only
> > > > > > once.
> > > > > > >>>> At Least Once: A tuple can be transmitted (written) one or
> > more
> > > > > times.
> > > > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > > > >>>>
> > > > > > >>>> Then we could provide a table with the strongest Output
> > > Guarantee
> > > > > that
> > > > > > >>>> is
> > > > > > >>>> possible for each Processing Guarantee.
> > > > > > >>>>
> > > > > > >>>> Processing          |   Strongest Output Guarantee
> > > > > > >>>> ----------------------------------------------
> > > > > > >>>> At Most Once      | At Most Once
> > > > > > >>>> At Least Once     | Exactly Once
> > > > > > >>>> Exactly Once      |  Exactly Once
> > > > > > >>>>
> > > > > > >>>> Thoughts?
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>> Tim
> > > > > > >>>>
> > > > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > > > sandesh@datatorrent.com>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> I agree with Tim. Instead of new terminologies, better
> > > explanation
> > > > > for
> > > > > > >>>>>
> > > > > > >>>> the
> > > > > > >>>>
> > > > > > >>>>> existing once are more useful.
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > > > pramod@datatorrent.com
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>> The idea is to disambiguate without using at least once
> since
> > > > > exactly
> > > > > > >>>>>>
> > > > > > >>>>> once
> > > > > > >>>>>
> > > > > > >>>>>> output can still be achieved with those. Any other names
> are
> > > > fine,
> > > > > > >>>>>>
> > > > > > >>>>> those
> > > > > > >>>>
> > > > > > >>>>> were just suggestions.
> > > > > > >>>>>>
> > > > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > > > > tim@datatorrent.com
> > > > > > >
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>> The new names don't make as much sense to me as the
> original
> > > > > names.
> > > > > > >>>>>>>
> > > > > > >>>>>> The
> > > > > > >>>>
> > > > > > >>>>> concepts require some thought to understand, and it won't
> > > > > > >>>>>>>
> > > > > > >>>>>> necessarily
> > > > > > >>>
> > > > > > >>>> be
> > > > > > >>>>>
> > > > > > >>>>>> made easier with a name change. I think a better way to
> > attack
> > > > > > >>>>>>> misunderstandings is to clearly explain what a window,
> > > > operator,
> > > > > > >>>>>>>
> > > > > > >>>>>> input
> > > > > > >>>>
> > > > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is
> with
> > > > > > >>>>>>>
> > > > > > >>>>>> really
> > > > > > >>>
> > > > > > >>>> clean
> > > > > > >>>>>>
> > > > > > >>>>>>> and simple illustrations of the concepts. Then we can
> > explain
> > > > > more
> > > > > > >>>>>>>
> > > > > > >>>>>> involved
> > > > > > >>>>>>
> > > > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly
> Once
> > > > with
> > > > > > >>>>>>>
> > > > > > >>>>>> well
> > > > > > >>>
> > > > > > >>>> thought illustrations. Without a clear explanation of the
> > basic
> > > > > > >>>>>>>
> > > > > > >>>>>> vocabulary,
> > > > > > >>>>>>
> > > > > > >>>>>>> and without pictures, it is difficult to get even
> technical
> > > > > people
> > > > > > >>>>>>>
> > > > > > >>>>>> to
> > > > > > >>>
> > > > > > >>>> understand these concepts.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>> Tim
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > > > > >>>>>>>
> > > > > > >>>>>> pramod@datatorrent.com>
> > > > > > >>>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>> Today we support three different processing modes for
> > > > operators,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> "at
> > > > > > >>>>
> > > > > > >>>>> least
> > > > > > >>>>>>>
> > > > > > >>>>>>>> once", "at most once" and "exactly once" which determine
> > > tuple
> > > > > > >>>>>>>>
> > > > > > >>>>>>> processing
> > > > > > >>>>>>
> > > > > > >>>>>>> and recovery behavior when there is operator recovery
> from
> > > > > > >>>>>>>>
> > > > > > >>>>>>> failure.
> > > > > > >>>
> > > > > > >>>> The
> > > > > > >>>>>
> > > > > > >>>>>> default being at least once where the tuples are replayed
> > from
> > > > > > >>>>>>>>
> > > > > > >>>>>>> the
> > > > > > >>>
> > > > > > >>>> recovered checkpoint.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> At least once works well for most applications.
> Typically
> > > > > > >>>>>>>>
> > > > > > >>>>>>> applications
> > > > > > >>>>>
> > > > > > >>>>>> persist the final output of processing through the DAG
> into
> > > > > > >>>>>>>>
> > > > > > >>>>>>> various
> > > > > > >>>
> > > > > > >>>> outputs
> > > > > > >>>>>>>
> > > > > > >>>>>>>> like key value stores, databases or even HDFS files. In
> > many
> > > > of
> > > > > > >>>>>>>>
> > > > > > >>>>>>> these
> > > > > > >>>>
> > > > > > >>>>> cases
> > > > > > >>>>>>>
> > > > > > >>>>>>>> various strategies can be employed to save the data
> > "exactly
> > > > > > >>>>>>>>
> > > > > > >>>>>>> once"
> > > > > > >>>
> > > > > > >>>> in
> > > > > > >>>>
> > > > > > >>>>> the
> > > > > > >>>>>>
> > > > > > >>>>>>> output, such as transactions, rewinding, meta data
> storage,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> idempotent
> > > > > > >>>>>
> > > > > > >>>>>> operations etc. Furthermore the exactly once processing
> > mode,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> which
> > > > > > >>>
> > > > > > >>>> is
> > > > > > >>>>>
> > > > > > >>>>>> a
> > > > > > >>>>>>
> > > > > > >>>>>>> checkpoint performed every window is rarely used. All
> this
> > > > leads
> > > > > > >>>>>>>>
> > > > > > >>>>>>> to
> > > > > > >>>
> > > > > > >>>> confusion especially to somebody new and also makes it
> > difficult
> > > > > > >>>>>>>>
> > > > > > >>>>>>> to
> > > > > > >>>
> > > > > > >>>> explain
> > > > > > >>>>>>>
> > > > > > >>>>>>>> these names to less technical audience in meetups and
> > public
> > > > > > >>>>>>>>
> > > > > > >>>>>>> forums.
> > > > > > >>>>
> > > > > > >>>>> What I am proposing is only a name change which will make
> > this
> > > > > > >>>>>>>>
> > > > > > >>>>>>> more
> > > > > > >>>
> > > > > > >>>> intuitive to understand. Something simple like "repeat" for
> > "at
> > > > > > >>>>>>>>
> > > > > > >>>>>>> least
> > > > > > >>>>
> > > > > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > > > > >>>>>>>>
> > > > > > >>>>>>> "exactly
> > > > > > >>>
> > > > > > >>>> once"
> > > > > > >>>>>>
> > > > > > >>>>>>> can do the trick.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Thomas Weise <th...@datatorrent.com>.
We should stick with standard terminology but make sure the differences are
well explained. That's necessary because other platforms use the same words
with different meaning, compare Storm, Spark Streaming and Flink.

Take "exactly once" as example. Elsewhere you will find it claimed when it
really is "at least once". Events are replayed and computation repeated.
When all operations in the overall system are idempotent, then it is
possible to avoid effects such as double counting, duplicate web service
calls or rows in the database etc. Hence, the engine cannot claim to
support "exactly once", this is only valid when operators used in the
application collectively support it.

In Apex, the engine provides the hooks (endWindow, committed) to achieve
idempotency in operators that have an effect on external systems. There are
several implementations of operators that can be used with at-least-once
processing mode that will deliver "exactly-once" for the application when
all operations in the DAG are idempotent.





On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <sh...@datatorrent.com>
wrote:

> +1 for adding detailed explanation about the concepts in tutorials.
>
>
> On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
> chinmay@datatorrent.com>
> wrote:
>
> > +1 for Vlad's suggestion. Searching for keywords like "at least once",
> "at
> > most once" and "exactly once" tells that these terminologies are are
> widely
> > popular where semantics are defined for tuple processing.
> > Adding example applications for each of them would help in educating the
> > terminologies in Apex context.
> >
> > On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
> chanchal.apexrtx@gmail.com
> > >
> > wrote:
> >
> > > I do agree with Vlad. it will be good to have good explanation with
> > example
> > > for existing names as it will be not create confusion for those who
> > already
> > > knows it and also for those who are beginners.
> > >
> > > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com>
> wrote:
> > >
> > > > I agree with Vlad too.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > >
> > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
> ram@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > I agree with Vlad: these names are so deeply embedded in the
> > community
> > > > that
> > > > > changing them is likely
> > > > > to create more problems than it solves.
> > > > >
> > > > > Ram
> > > > >
> > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
> v.rozov@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > I vote to keep original names and educate/explain their meaning
> to
> > > non
> > > > > > technical audience as delivery guarantee is not specific to Apex,
> > but
> > > > has
> > > > > > common meaning for all streaming platforms.
> > > > > >
> > > > > > Vlad
> > > > > >
> > > > > >
> > > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > > >
> > > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > > >> ProcessingModes?
> > > > > >>
> > > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT =
> ProcessingMode.AT_LEAST_ONCE
> > > > > >>
> > > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> > ProcessingMode.AT_MOST_ONCE
> > > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > > ProcessingMode.AT_LEAST_ONCE
> > > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> > ProcessingMode.EXACTLY_ONCE
> > > > > >>
> > > > > >> Tim
> > > > > >>
> > > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > > pramod@datatorrent.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> Well output guarantees are managed by the operators themselves
> so
> > > the
> > > > > user
> > > > > >>> will typically not see that as part of the engine features,
> they
> > > only
> > > > > see
> > > > > >>> processing guarantees and while they are technically correct as
> > far
> > > > as
> > > > > >>> individual operators are concerned the names give a different
> > idea.
> > > > > >>>
> > > > > >>> Thanks
> > > > > >>>
> > > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > > tim@datatorrent.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> I think I understand the ambiguity you are trying to clear up
> > > Pramod.
> > > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > > Processing
> > > > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > > > Processing
> > > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > > operators.
> > > > > >>>> Output Guarantees apply to the way output operators write
> tuples
> > > to
> > > > a
> > > > > >>>>
> > > > > >>> Data
> > > > > >>>
> > > > > >>>> Sink.
> > > > > >>>>
> > > > > >>>> This way we can describe each term intuitively in each
> context:
> > > > > >>>>
> > > > > >>>> At Most Once: A tuple can be dropped or transmitted (written)
> > only
> > > > > once.
> > > > > >>>> At Least Once: A tuple can be transmitted (written) one or
> more
> > > > times.
> > > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > > >>>>
> > > > > >>>> Then we could provide a table with the strongest Output
> > Guarantee
> > > > that
> > > > > >>>> is
> > > > > >>>> possible for each Processing Guarantee.
> > > > > >>>>
> > > > > >>>> Processing          |   Strongest Output Guarantee
> > > > > >>>> ----------------------------------------------
> > > > > >>>> At Most Once      | At Most Once
> > > > > >>>> At Least Once     | Exactly Once
> > > > > >>>> Exactly Once      |  Exactly Once
> > > > > >>>>
> > > > > >>>> Thoughts?
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Tim
> > > > > >>>>
> > > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > > sandesh@datatorrent.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> I agree with Tim. Instead of new terminologies, better
> > explanation
> > > > for
> > > > > >>>>>
> > > > > >>>> the
> > > > > >>>>
> > > > > >>>>> existing once are more useful.
> > > > > >>>>>
> > > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > > pramod@datatorrent.com
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>> The idea is to disambiguate without using at least once since
> > > > exactly
> > > > > >>>>>>
> > > > > >>>>> once
> > > > > >>>>>
> > > > > >>>>>> output can still be achieved with those. Any other names are
> > > fine,
> > > > > >>>>>>
> > > > > >>>>> those
> > > > > >>>>
> > > > > >>>>> were just suggestions.
> > > > > >>>>>>
> > > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > > > tim@datatorrent.com
> > > > > >
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>> The new names don't make as much sense to me as the original
> > > > names.
> > > > > >>>>>>>
> > > > > >>>>>> The
> > > > > >>>>
> > > > > >>>>> concepts require some thought to understand, and it won't
> > > > > >>>>>>>
> > > > > >>>>>> necessarily
> > > > > >>>
> > > > > >>>> be
> > > > > >>>>>
> > > > > >>>>>> made easier with a name change. I think a better way to
> attack
> > > > > >>>>>>> misunderstandings is to clearly explain what a window,
> > > operator,
> > > > > >>>>>>>
> > > > > >>>>>> input
> > > > > >>>>
> > > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > > > > >>>>>>>
> > > > > >>>>>> really
> > > > > >>>
> > > > > >>>> clean
> > > > > >>>>>>
> > > > > >>>>>>> and simple illustrations of the concepts. Then we can
> explain
> > > > more
> > > > > >>>>>>>
> > > > > >>>>>> involved
> > > > > >>>>>>
> > > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once
> > > with
> > > > > >>>>>>>
> > > > > >>>>>> well
> > > > > >>>
> > > > > >>>> thought illustrations. Without a clear explanation of the
> basic
> > > > > >>>>>>>
> > > > > >>>>>> vocabulary,
> > > > > >>>>>>
> > > > > >>>>>>> and without pictures, it is difficult to get even technical
> > > > people
> > > > > >>>>>>>
> > > > > >>>>>> to
> > > > > >>>
> > > > > >>>> understand these concepts.
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> Tim
> > > > > >>>>>>>
> > > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > > > >>>>>>>
> > > > > >>>>>> pramod@datatorrent.com>
> > > > > >>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> Today we support three different processing modes for
> > > operators,
> > > > > >>>>>>>>
> > > > > >>>>>>> "at
> > > > > >>>>
> > > > > >>>>> least
> > > > > >>>>>>>
> > > > > >>>>>>>> once", "at most once" and "exactly once" which determine
> > tuple
> > > > > >>>>>>>>
> > > > > >>>>>>> processing
> > > > > >>>>>>
> > > > > >>>>>>> and recovery behavior when there is operator recovery from
> > > > > >>>>>>>>
> > > > > >>>>>>> failure.
> > > > > >>>
> > > > > >>>> The
> > > > > >>>>>
> > > > > >>>>>> default being at least once where the tuples are replayed
> from
> > > > > >>>>>>>>
> > > > > >>>>>>> the
> > > > > >>>
> > > > > >>>> recovered checkpoint.
> > > > > >>>>>>>>
> > > > > >>>>>>>> At least once works well for most applications. Typically
> > > > > >>>>>>>>
> > > > > >>>>>>> applications
> > > > > >>>>>
> > > > > >>>>>> persist the final output of processing through the DAG into
> > > > > >>>>>>>>
> > > > > >>>>>>> various
> > > > > >>>
> > > > > >>>> outputs
> > > > > >>>>>>>
> > > > > >>>>>>>> like key value stores, databases or even HDFS files. In
> many
> > > of
> > > > > >>>>>>>>
> > > > > >>>>>>> these
> > > > > >>>>
> > > > > >>>>> cases
> > > > > >>>>>>>
> > > > > >>>>>>>> various strategies can be employed to save the data
> "exactly
> > > > > >>>>>>>>
> > > > > >>>>>>> once"
> > > > > >>>
> > > > > >>>> in
> > > > > >>>>
> > > > > >>>>> the
> > > > > >>>>>>
> > > > > >>>>>>> output, such as transactions, rewinding, meta data storage,
> > > > > >>>>>>>>
> > > > > >>>>>>> idempotent
> > > > > >>>>>
> > > > > >>>>>> operations etc. Furthermore the exactly once processing
> mode,
> > > > > >>>>>>>>
> > > > > >>>>>>> which
> > > > > >>>
> > > > > >>>> is
> > > > > >>>>>
> > > > > >>>>>> a
> > > > > >>>>>>
> > > > > >>>>>>> checkpoint performed every window is rarely used. All this
> > > leads
> > > > > >>>>>>>>
> > > > > >>>>>>> to
> > > > > >>>
> > > > > >>>> confusion especially to somebody new and also makes it
> difficult
> > > > > >>>>>>>>
> > > > > >>>>>>> to
> > > > > >>>
> > > > > >>>> explain
> > > > > >>>>>>>
> > > > > >>>>>>>> these names to less technical audience in meetups and
> public
> > > > > >>>>>>>>
> > > > > >>>>>>> forums.
> > > > > >>>>
> > > > > >>>>> What I am proposing is only a name change which will make
> this
> > > > > >>>>>>>>
> > > > > >>>>>>> more
> > > > > >>>
> > > > > >>>> intuitive to understand. Something simple like "repeat" for
> "at
> > > > > >>>>>>>>
> > > > > >>>>>>> least
> > > > > >>>>
> > > > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > > > >>>>>>>>
> > > > > >>>>>>> "exactly
> > > > > >>>
> > > > > >>>> once"
> > > > > >>>>>>
> > > > > >>>>>>> can do the trick.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Shubham Pathak <sh...@datatorrent.com>.
+1 for adding detailed explanation about the concepts in tutorials.


On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <ch...@datatorrent.com>
wrote:

> +1 for Vlad's suggestion. Searching for keywords like "at least once", "at
> most once" and "exactly once" tells that these terminologies are are widely
> popular where semantics are defined for tuple processing.
> Adding example applications for each of them would help in educating the
> terminologies in Apex context.
>
> On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <chanchal.apexrtx@gmail.com
> >
> wrote:
>
> > I do agree with Vlad. it will be good to have good explanation with
> example
> > for existing names as it will be not create confusion for those who
> already
> > knows it and also for those who are beginners.
> >
> > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com> wrote:
> >
> > > I agree with Vlad too.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ram@datatorrent.com
> >
> > > wrote:
> > >
> > > > I agree with Vlad: these names are so deeply embedded in the
> community
> > > that
> > > > changing them is likely
> > > > to create more problems than it solves.
> > > >
> > > > Ram
> > > >
> > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com>
> > > > wrote:
> > > >
> > > > > I vote to keep original names and educate/explain their meaning to
> > non
> > > > > technical audience as delivery guarantee is not specific to Apex,
> but
> > > has
> > > > > common meaning for all streaming platforms.
> > > > >
> > > > > Vlad
> > > > >
> > > > >
> > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > >
> > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > >> ProcessingModes?
> > > > >>
> > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> > > > >>
> > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> ProcessingMode.AT_MOST_ONCE
> > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > ProcessingMode.AT_LEAST_ONCE
> > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> ProcessingMode.EXACTLY_ONCE
> > > > >>
> > > > >> Tim
> > > > >>
> > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > pramod@datatorrent.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> Well output guarantees are managed by the operators themselves so
> > the
> > > > user
> > > > >>> will typically not see that as part of the engine features, they
> > only
> > > > see
> > > > >>> processing guarantees and while they are technically correct as
> far
> > > as
> > > > >>> individual operators are concerned the names give a different
> idea.
> > > > >>>
> > > > >>> Thanks
> > > > >>>
> > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > tim@datatorrent.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> I think I understand the ambiguity you are trying to clear up
> > Pramod.
> > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > Processing
> > > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > > Processing
> > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > operators.
> > > > >>>> Output Guarantees apply to the way output operators write tuples
> > to
> > > a
> > > > >>>>
> > > > >>> Data
> > > > >>>
> > > > >>>> Sink.
> > > > >>>>
> > > > >>>> This way we can describe each term intuitively in each context:
> > > > >>>>
> > > > >>>> At Most Once: A tuple can be dropped or transmitted (written)
> only
> > > > once.
> > > > >>>> At Least Once: A tuple can be transmitted (written) one or more
> > > times.
> > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > >>>>
> > > > >>>> Then we could provide a table with the strongest Output
> Guarantee
> > > that
> > > > >>>> is
> > > > >>>> possible for each Processing Guarantee.
> > > > >>>>
> > > > >>>> Processing          |   Strongest Output Guarantee
> > > > >>>> ----------------------------------------------
> > > > >>>> At Most Once      | At Most Once
> > > > >>>> At Least Once     | Exactly Once
> > > > >>>> Exactly Once      |  Exactly Once
> > > > >>>>
> > > > >>>> Thoughts?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Tim
> > > > >>>>
> > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > sandesh@datatorrent.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> I agree with Tim. Instead of new terminologies, better
> explanation
> > > for
> > > > >>>>>
> > > > >>>> the
> > > > >>>>
> > > > >>>>> existing once are more useful.
> > > > >>>>>
> > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > pramod@datatorrent.com
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> The idea is to disambiguate without using at least once since
> > > exactly
> > > > >>>>>>
> > > > >>>>> once
> > > > >>>>>
> > > > >>>>>> output can still be achieved with those. Any other names are
> > fine,
> > > > >>>>>>
> > > > >>>>> those
> > > > >>>>
> > > > >>>>> were just suggestions.
> > > > >>>>>>
> > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > > tim@datatorrent.com
> > > > >
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>> The new names don't make as much sense to me as the original
> > > names.
> > > > >>>>>>>
> > > > >>>>>> The
> > > > >>>>
> > > > >>>>> concepts require some thought to understand, and it won't
> > > > >>>>>>>
> > > > >>>>>> necessarily
> > > > >>>
> > > > >>>> be
> > > > >>>>>
> > > > >>>>>> made easier with a name change. I think a better way to attack
> > > > >>>>>>> misunderstandings is to clearly explain what a window,
> > operator,
> > > > >>>>>>>
> > > > >>>>>> input
> > > > >>>>
> > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > > > >>>>>>>
> > > > >>>>>> really
> > > > >>>
> > > > >>>> clean
> > > > >>>>>>
> > > > >>>>>>> and simple illustrations of the concepts. Then we can explain
> > > more
> > > > >>>>>>>
> > > > >>>>>> involved
> > > > >>>>>>
> > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once
> > with
> > > > >>>>>>>
> > > > >>>>>> well
> > > > >>>
> > > > >>>> thought illustrations. Without a clear explanation of the basic
> > > > >>>>>>>
> > > > >>>>>> vocabulary,
> > > > >>>>>>
> > > > >>>>>>> and without pictures, it is difficult to get even technical
> > > people
> > > > >>>>>>>
> > > > >>>>>> to
> > > > >>>
> > > > >>>> understand these concepts.
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>> Tim
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > > >>>>>>>
> > > > >>>>>> pramod@datatorrent.com>
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Today we support three different processing modes for
> > operators,
> > > > >>>>>>>>
> > > > >>>>>>> "at
> > > > >>>>
> > > > >>>>> least
> > > > >>>>>>>
> > > > >>>>>>>> once", "at most once" and "exactly once" which determine
> tuple
> > > > >>>>>>>>
> > > > >>>>>>> processing
> > > > >>>>>>
> > > > >>>>>>> and recovery behavior when there is operator recovery from
> > > > >>>>>>>>
> > > > >>>>>>> failure.
> > > > >>>
> > > > >>>> The
> > > > >>>>>
> > > > >>>>>> default being at least once where the tuples are replayed from
> > > > >>>>>>>>
> > > > >>>>>>> the
> > > > >>>
> > > > >>>> recovered checkpoint.
> > > > >>>>>>>>
> > > > >>>>>>>> At least once works well for most applications. Typically
> > > > >>>>>>>>
> > > > >>>>>>> applications
> > > > >>>>>
> > > > >>>>>> persist the final output of processing through the DAG into
> > > > >>>>>>>>
> > > > >>>>>>> various
> > > > >>>
> > > > >>>> outputs
> > > > >>>>>>>
> > > > >>>>>>>> like key value stores, databases or even HDFS files. In many
> > of
> > > > >>>>>>>>
> > > > >>>>>>> these
> > > > >>>>
> > > > >>>>> cases
> > > > >>>>>>>
> > > > >>>>>>>> various strategies can be employed to save the data "exactly
> > > > >>>>>>>>
> > > > >>>>>>> once"
> > > > >>>
> > > > >>>> in
> > > > >>>>
> > > > >>>>> the
> > > > >>>>>>
> > > > >>>>>>> output, such as transactions, rewinding, meta data storage,
> > > > >>>>>>>>
> > > > >>>>>>> idempotent
> > > > >>>>>
> > > > >>>>>> operations etc. Furthermore the exactly once processing mode,
> > > > >>>>>>>>
> > > > >>>>>>> which
> > > > >>>
> > > > >>>> is
> > > > >>>>>
> > > > >>>>>> a
> > > > >>>>>>
> > > > >>>>>>> checkpoint performed every window is rarely used. All this
> > leads
> > > > >>>>>>>>
> > > > >>>>>>> to
> > > > >>>
> > > > >>>> confusion especially to somebody new and also makes it difficult
> > > > >>>>>>>>
> > > > >>>>>>> to
> > > > >>>
> > > > >>>> explain
> > > > >>>>>>>
> > > > >>>>>>>> these names to less technical audience in meetups and public
> > > > >>>>>>>>
> > > > >>>>>>> forums.
> > > > >>>>
> > > > >>>>> What I am proposing is only a name change which will make this
> > > > >>>>>>>>
> > > > >>>>>>> more
> > > > >>>
> > > > >>>> intuitive to understand. Something simple like "repeat" for "at
> > > > >>>>>>>>
> > > > >>>>>>> least
> > > > >>>>
> > > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > > >>>>>>>>
> > > > >>>>>>> "exactly
> > > > >>>
> > > > >>>> once"
> > > > >>>>>>
> > > > >>>>>>> can do the trick.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.
+1 for Vlad's suggestion. Searching for keywords like "at least once", "at
most once" and "exactly once" tells that these terminologies are are widely
popular where semantics are defined for tuple processing.
Adding example applications for each of them would help in educating the
terminologies in Apex context.

On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <ch...@gmail.com>
wrote:

> I do agree with Vlad. it will be good to have good explanation with example
> for existing names as it will be not create confusion for those who already
> knows it and also for those who are beginners.
>
> On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com> wrote:
>
> > I agree with Vlad too.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ra...@datatorrent.com>
> > wrote:
> >
> > > I agree with Vlad: these names are so deeply embedded in the community
> > that
> > > changing them is likely
> > > to create more problems than it solves.
> > >
> > > Ram
> > >
> > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com>
> > > wrote:
> > >
> > > > I vote to keep original names and educate/explain their meaning to
> non
> > > > technical audience as delivery guarantee is not specific to Apex, but
> > has
> > > > common meaning for all streaming platforms.
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > >
> > > >> Could we provide Processing and Output Centric Aliases for the
> > > >> ProcessingModes?
> > > >>
> > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> > > >>
> > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> ProcessingMode.AT_LEAST_ONCE
> > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
> > > >>
> > > >> Tim
> > > >>
> > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > pramod@datatorrent.com
> > > >
> > > >> wrote:
> > > >>
> > > >> Well output guarantees are managed by the operators themselves so
> the
> > > user
> > > >>> will typically not see that as part of the engine features, they
> only
> > > see
> > > >>> processing guarantees and while they are technically correct as far
> > as
> > > >>> individual operators are concerned the names give a different idea.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> tim@datatorrent.com>
> > > >>> wrote:
> > > >>>
> > > >>> I think I understand the ambiguity you are trying to clear up
> Pramod.
> > > >>>> Perhaps it can be disambiguated by distinguishing between
> Processing
> > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > Processing
> > > >>>> Guarantees apply to the way tuples are transmitted between
> > operators.
> > > >>>> Output Guarantees apply to the way output operators write tuples
> to
> > a
> > > >>>>
> > > >>> Data
> > > >>>
> > > >>>> Sink.
> > > >>>>
> > > >>>> This way we can describe each term intuitively in each context:
> > > >>>>
> > > >>>> At Most Once: A tuple can be dropped or transmitted (written) only
> > > once.
> > > >>>> At Least Once: A tuple can be transmitted (written) one or more
> > times.
> > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > >>>>
> > > >>>> Then we could provide a table with the strongest Output Guarantee
> > that
> > > >>>> is
> > > >>>> possible for each Processing Guarantee.
> > > >>>>
> > > >>>> Processing          |   Strongest Output Guarantee
> > > >>>> ----------------------------------------------
> > > >>>> At Most Once      | At Most Once
> > > >>>> At Least Once     | Exactly Once
> > > >>>> Exactly Once      |  Exactly Once
> > > >>>>
> > > >>>> Thoughts?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Tim
> > > >>>>
> > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > sandesh@datatorrent.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>> I agree with Tim. Instead of new terminologies, better explanation
> > for
> > > >>>>>
> > > >>>> the
> > > >>>>
> > > >>>>> existing once are more useful.
> > > >>>>>
> > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > pramod@datatorrent.com
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> The idea is to disambiguate without using at least once since
> > exactly
> > > >>>>>>
> > > >>>>> once
> > > >>>>>
> > > >>>>>> output can still be achieved with those. Any other names are
> fine,
> > > >>>>>>
> > > >>>>> those
> > > >>>>
> > > >>>>> were just suggestions.
> > > >>>>>>
> > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > tim@datatorrent.com
> > > >
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>> The new names don't make as much sense to me as the original
> > names.
> > > >>>>>>>
> > > >>>>>> The
> > > >>>>
> > > >>>>> concepts require some thought to understand, and it won't
> > > >>>>>>>
> > > >>>>>> necessarily
> > > >>>
> > > >>>> be
> > > >>>>>
> > > >>>>>> made easier with a name change. I think a better way to attack
> > > >>>>>>> misunderstandings is to clearly explain what a window,
> operator,
> > > >>>>>>>
> > > >>>>>> input
> > > >>>>
> > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > > >>>>>>>
> > > >>>>>> really
> > > >>>
> > > >>>> clean
> > > >>>>>>
> > > >>>>>>> and simple illustrations of the concepts. Then we can explain
> > more
> > > >>>>>>>
> > > >>>>>> involved
> > > >>>>>>
> > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once
> with
> > > >>>>>>>
> > > >>>>>> well
> > > >>>
> > > >>>> thought illustrations. Without a clear explanation of the basic
> > > >>>>>>>
> > > >>>>>> vocabulary,
> > > >>>>>>
> > > >>>>>>> and without pictures, it is difficult to get even technical
> > people
> > > >>>>>>>
> > > >>>>>> to
> > > >>>
> > > >>>> understand these concepts.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Tim
> > > >>>>>>>
> > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > >>>>>>>
> > > >>>>>> pramod@datatorrent.com>
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Today we support three different processing modes for
> operators,
> > > >>>>>>>>
> > > >>>>>>> "at
> > > >>>>
> > > >>>>> least
> > > >>>>>>>
> > > >>>>>>>> once", "at most once" and "exactly once" which determine tuple
> > > >>>>>>>>
> > > >>>>>>> processing
> > > >>>>>>
> > > >>>>>>> and recovery behavior when there is operator recovery from
> > > >>>>>>>>
> > > >>>>>>> failure.
> > > >>>
> > > >>>> The
> > > >>>>>
> > > >>>>>> default being at least once where the tuples are replayed from
> > > >>>>>>>>
> > > >>>>>>> the
> > > >>>
> > > >>>> recovered checkpoint.
> > > >>>>>>>>
> > > >>>>>>>> At least once works well for most applications. Typically
> > > >>>>>>>>
> > > >>>>>>> applications
> > > >>>>>
> > > >>>>>> persist the final output of processing through the DAG into
> > > >>>>>>>>
> > > >>>>>>> various
> > > >>>
> > > >>>> outputs
> > > >>>>>>>
> > > >>>>>>>> like key value stores, databases or even HDFS files. In many
> of
> > > >>>>>>>>
> > > >>>>>>> these
> > > >>>>
> > > >>>>> cases
> > > >>>>>>>
> > > >>>>>>>> various strategies can be employed to save the data "exactly
> > > >>>>>>>>
> > > >>>>>>> once"
> > > >>>
> > > >>>> in
> > > >>>>
> > > >>>>> the
> > > >>>>>>
> > > >>>>>>> output, such as transactions, rewinding, meta data storage,
> > > >>>>>>>>
> > > >>>>>>> idempotent
> > > >>>>>
> > > >>>>>> operations etc. Furthermore the exactly once processing mode,
> > > >>>>>>>>
> > > >>>>>>> which
> > > >>>
> > > >>>> is
> > > >>>>>
> > > >>>>>> a
> > > >>>>>>
> > > >>>>>>> checkpoint performed every window is rarely used. All this
> leads
> > > >>>>>>>>
> > > >>>>>>> to
> > > >>>
> > > >>>> confusion especially to somebody new and also makes it difficult
> > > >>>>>>>>
> > > >>>>>>> to
> > > >>>
> > > >>>> explain
> > > >>>>>>>
> > > >>>>>>>> these names to less technical audience in meetups and public
> > > >>>>>>>>
> > > >>>>>>> forums.
> > > >>>>
> > > >>>>> What I am proposing is only a name change which will make this
> > > >>>>>>>>
> > > >>>>>>> more
> > > >>>
> > > >>>> intuitive to understand. Something simple like "repeat" for "at
> > > >>>>>>>>
> > > >>>>>>> least
> > > >>>>
> > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > >>>>>>>>
> > > >>>>>>> "exactly
> > > >>>
> > > >>>> once"
> > > >>>>>>
> > > >>>>>>> can do the trick.
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>>
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Chanchal Singh <ch...@gmail.com>.
I do agree with Vlad. it will be good to have good explanation with example
for existing names as it will be not create confusion for those who already
knows it and also for those who are beginners.

On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <am...@datatorrent.com> wrote:

> I agree with Vlad too.
>
> Thks
> Amol
>
>
> On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> > I agree with Vlad: these names are so deeply embedded in the community
> that
> > changing them is likely
> > to create more problems than it solves.
> >
> > Ram
> >
> > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com>
> > wrote:
> >
> > > I vote to keep original names and educate/explain their meaning to non
> > > technical audience as delivery guarantee is not specific to Apex, but
> has
> > > common meaning for all streaming platforms.
> > >
> > > Vlad
> > >
> > >
> > > On 2/2/16 15:17, Timothy Farkas wrote:
> > >
> > >> Could we provide Processing and Output Centric Aliases for the
> > >> ProcessingModes?
> > >>
> > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> > >>
> > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
> > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
> > >>
> > >> Tim
> > >>
> > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> pramod@datatorrent.com
> > >
> > >> wrote:
> > >>
> > >> Well output guarantees are managed by the operators themselves so the
> > user
> > >>> will typically not see that as part of the engine features, they only
> > see
> > >>> processing guarantees and while they are technically correct as far
> as
> > >>> individual operators are concerned the names give a different idea.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
> > >>> wrote:
> > >>>
> > >>> I think I understand the ambiguity you are trying to clear up Pramod.
> > >>>> Perhaps it can be disambiguated by distinguishing between Processing
> > >>>> Guarantees and Output Guarantees, when explaining to people.
> > Processing
> > >>>> Guarantees apply to the way tuples are transmitted between
> operators.
> > >>>> Output Guarantees apply to the way output operators write tuples to
> a
> > >>>>
> > >>> Data
> > >>>
> > >>>> Sink.
> > >>>>
> > >>>> This way we can describe each term intuitively in each context:
> > >>>>
> > >>>> At Most Once: A tuple can be dropped or transmitted (written) only
> > once.
> > >>>> At Least Once: A tuple can be transmitted (written) one or more
> times.
> > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > >>>>
> > >>>> Then we could provide a table with the strongest Output Guarantee
> that
> > >>>> is
> > >>>> possible for each Processing Guarantee.
> > >>>>
> > >>>> Processing          |   Strongest Output Guarantee
> > >>>> ----------------------------------------------
> > >>>> At Most Once      | At Most Once
> > >>>> At Least Once     | Exactly Once
> > >>>> Exactly Once      |  Exactly Once
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> Thanks,
> > >>>> Tim
> > >>>>
> > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > sandesh@datatorrent.com>
> > >>>> wrote:
> > >>>>
> > >>>> I agree with Tim. Instead of new terminologies, better explanation
> for
> > >>>>>
> > >>>> the
> > >>>>
> > >>>>> existing once are more useful.
> > >>>>>
> > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > pramod@datatorrent.com
> > >>>>> wrote:
> > >>>>>
> > >>>>> The idea is to disambiguate without using at least once since
> exactly
> > >>>>>>
> > >>>>> once
> > >>>>>
> > >>>>>> output can still be achieved with those. Any other names are fine,
> > >>>>>>
> > >>>>> those
> > >>>>
> > >>>>> were just suggestions.
> > >>>>>>
> > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> tim@datatorrent.com
> > >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> The new names don't make as much sense to me as the original
> names.
> > >>>>>>>
> > >>>>>> The
> > >>>>
> > >>>>> concepts require some thought to understand, and it won't
> > >>>>>>>
> > >>>>>> necessarily
> > >>>
> > >>>> be
> > >>>>>
> > >>>>>> made easier with a name change. I think a better way to attack
> > >>>>>>> misunderstandings is to clearly explain what a window, operator,
> > >>>>>>>
> > >>>>>> input
> > >>>>
> > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > >>>>>>>
> > >>>>>> really
> > >>>
> > >>>> clean
> > >>>>>>
> > >>>>>>> and simple illustrations of the concepts. Then we can explain
> more
> > >>>>>>>
> > >>>>>> involved
> > >>>>>>
> > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
> > >>>>>>>
> > >>>>>> well
> > >>>
> > >>>> thought illustrations. Without a clear explanation of the basic
> > >>>>>>>
> > >>>>>> vocabulary,
> > >>>>>>
> > >>>>>>> and without pictures, it is difficult to get even technical
> people
> > >>>>>>>
> > >>>>>> to
> > >>>
> > >>>> understand these concepts.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Tim
> > >>>>>>>
> > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > >>>>>>>
> > >>>>>> pramod@datatorrent.com>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Today we support three different processing modes for operators,
> > >>>>>>>>
> > >>>>>>> "at
> > >>>>
> > >>>>> least
> > >>>>>>>
> > >>>>>>>> once", "at most once" and "exactly once" which determine tuple
> > >>>>>>>>
> > >>>>>>> processing
> > >>>>>>
> > >>>>>>> and recovery behavior when there is operator recovery from
> > >>>>>>>>
> > >>>>>>> failure.
> > >>>
> > >>>> The
> > >>>>>
> > >>>>>> default being at least once where the tuples are replayed from
> > >>>>>>>>
> > >>>>>>> the
> > >>>
> > >>>> recovered checkpoint.
> > >>>>>>>>
> > >>>>>>>> At least once works well for most applications. Typically
> > >>>>>>>>
> > >>>>>>> applications
> > >>>>>
> > >>>>>> persist the final output of processing through the DAG into
> > >>>>>>>>
> > >>>>>>> various
> > >>>
> > >>>> outputs
> > >>>>>>>
> > >>>>>>>> like key value stores, databases or even HDFS files. In many of
> > >>>>>>>>
> > >>>>>>> these
> > >>>>
> > >>>>> cases
> > >>>>>>>
> > >>>>>>>> various strategies can be employed to save the data "exactly
> > >>>>>>>>
> > >>>>>>> once"
> > >>>
> > >>>> in
> > >>>>
> > >>>>> the
> > >>>>>>
> > >>>>>>> output, such as transactions, rewinding, meta data storage,
> > >>>>>>>>
> > >>>>>>> idempotent
> > >>>>>
> > >>>>>> operations etc. Furthermore the exactly once processing mode,
> > >>>>>>>>
> > >>>>>>> which
> > >>>
> > >>>> is
> > >>>>>
> > >>>>>> a
> > >>>>>>
> > >>>>>>> checkpoint performed every window is rarely used. All this leads
> > >>>>>>>>
> > >>>>>>> to
> > >>>
> > >>>> confusion especially to somebody new and also makes it difficult
> > >>>>>>>>
> > >>>>>>> to
> > >>>
> > >>>> explain
> > >>>>>>>
> > >>>>>>>> these names to less technical audience in meetups and public
> > >>>>>>>>
> > >>>>>>> forums.
> > >>>>
> > >>>>> What I am proposing is only a name change which will make this
> > >>>>>>>>
> > >>>>>>> more
> > >>>
> > >>>> intuitive to understand. Something simple like "repeat" for "at
> > >>>>>>>>
> > >>>>>>> least
> > >>>>
> > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > >>>>>>>>
> > >>>>>>> "exactly
> > >>>
> > >>>> once"
> > >>>>>>
> > >>>>>>> can do the trick.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>>
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Amol Kekre <am...@datatorrent.com>.
I agree with Vlad too.

Thks
Amol


On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ra...@datatorrent.com>
wrote:

> I agree with Vlad: these names are so deeply embedded in the community that
> changing them is likely
> to create more problems than it solves.
>
> Ram
>
> On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com>
> wrote:
>
> > I vote to keep original names and educate/explain their meaning to non
> > technical audience as delivery guarantee is not specific to Apex, but has
> > common meaning for all streaming platforms.
> >
> > Vlad
> >
> >
> > On 2/2/16 15:17, Timothy Farkas wrote:
> >
> >> Could we provide Processing and Output Centric Aliases for the
> >> ProcessingModes?
> >>
> >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> >>
> >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
> >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
> >>
> >> Tim
> >>
> >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pramod@datatorrent.com
> >
> >> wrote:
> >>
> >> Well output guarantees are managed by the operators themselves so the
> user
> >>> will typically not see that as part of the engine features, they only
> see
> >>> processing guarantees and while they are technically correct as far as
> >>> individual operators are concerned the names give a different idea.
> >>>
> >>> Thanks
> >>>
> >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
> >>> wrote:
> >>>
> >>> I think I understand the ambiguity you are trying to clear up Pramod.
> >>>> Perhaps it can be disambiguated by distinguishing between Processing
> >>>> Guarantees and Output Guarantees, when explaining to people.
> Processing
> >>>> Guarantees apply to the way tuples are transmitted between operators.
> >>>> Output Guarantees apply to the way output operators write tuples to a
> >>>>
> >>> Data
> >>>
> >>>> Sink.
> >>>>
> >>>> This way we can describe each term intuitively in each context:
> >>>>
> >>>> At Most Once: A tuple can be dropped or transmitted (written) only
> once.
> >>>> At Least Once: A tuple can be transmitted (written) one or more times.
> >>>> Exactly Once: A tuple is transmitted (written) only once.
> >>>>
> >>>> Then we could provide a table with the strongest Output Guarantee that
> >>>> is
> >>>> possible for each Processing Guarantee.
> >>>>
> >>>> Processing          |   Strongest Output Guarantee
> >>>> ----------------------------------------------
> >>>> At Most Once      | At Most Once
> >>>> At Least Once     | Exactly Once
> >>>> Exactly Once      |  Exactly Once
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks,
> >>>> Tim
> >>>>
> >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> sandesh@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> I agree with Tim. Instead of new terminologies, better explanation for
> >>>>>
> >>>> the
> >>>>
> >>>>> existing once are more useful.
> >>>>>
> >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> pramod@datatorrent.com
> >>>>> wrote:
> >>>>>
> >>>>> The idea is to disambiguate without using at least once since exactly
> >>>>>>
> >>>>> once
> >>>>>
> >>>>>> output can still be achieved with those. Any other names are fine,
> >>>>>>
> >>>>> those
> >>>>
> >>>>> were just suggestions.
> >>>>>>
> >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <tim@datatorrent.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>> The new names don't make as much sense to me as the original names.
> >>>>>>>
> >>>>>> The
> >>>>
> >>>>> concepts require some thought to understand, and it won't
> >>>>>>>
> >>>>>> necessarily
> >>>
> >>>> be
> >>>>>
> >>>>>> made easier with a name change. I think a better way to attack
> >>>>>>> misunderstandings is to clearly explain what a window, operator,
> >>>>>>>
> >>>>>> input
> >>>>
> >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> >>>>>>>
> >>>>>> really
> >>>
> >>>> clean
> >>>>>>
> >>>>>>> and simple illustrations of the concepts. Then we can explain more
> >>>>>>>
> >>>>>> involved
> >>>>>>
> >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
> >>>>>>>
> >>>>>> well
> >>>
> >>>> thought illustrations. Without a clear explanation of the basic
> >>>>>>>
> >>>>>> vocabulary,
> >>>>>>
> >>>>>>> and without pictures, it is difficult to get even technical people
> >>>>>>>
> >>>>>> to
> >>>
> >>>> understand these concepts.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Tim
> >>>>>>>
> >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> >>>>>>>
> >>>>>> pramod@datatorrent.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Today we support three different processing modes for operators,
> >>>>>>>>
> >>>>>>> "at
> >>>>
> >>>>> least
> >>>>>>>
> >>>>>>>> once", "at most once" and "exactly once" which determine tuple
> >>>>>>>>
> >>>>>>> processing
> >>>>>>
> >>>>>>> and recovery behavior when there is operator recovery from
> >>>>>>>>
> >>>>>>> failure.
> >>>
> >>>> The
> >>>>>
> >>>>>> default being at least once where the tuples are replayed from
> >>>>>>>>
> >>>>>>> the
> >>>
> >>>> recovered checkpoint.
> >>>>>>>>
> >>>>>>>> At least once works well for most applications. Typically
> >>>>>>>>
> >>>>>>> applications
> >>>>>
> >>>>>> persist the final output of processing through the DAG into
> >>>>>>>>
> >>>>>>> various
> >>>
> >>>> outputs
> >>>>>>>
> >>>>>>>> like key value stores, databases or even HDFS files. In many of
> >>>>>>>>
> >>>>>>> these
> >>>>
> >>>>> cases
> >>>>>>>
> >>>>>>>> various strategies can be employed to save the data "exactly
> >>>>>>>>
> >>>>>>> once"
> >>>
> >>>> in
> >>>>
> >>>>> the
> >>>>>>
> >>>>>>> output, such as transactions, rewinding, meta data storage,
> >>>>>>>>
> >>>>>>> idempotent
> >>>>>
> >>>>>> operations etc. Furthermore the exactly once processing mode,
> >>>>>>>>
> >>>>>>> which
> >>>
> >>>> is
> >>>>>
> >>>>>> a
> >>>>>>
> >>>>>>> checkpoint performed every window is rarely used. All this leads
> >>>>>>>>
> >>>>>>> to
> >>>
> >>>> confusion especially to somebody new and also makes it difficult
> >>>>>>>>
> >>>>>>> to
> >>>
> >>>> explain
> >>>>>>>
> >>>>>>>> these names to less technical audience in meetups and public
> >>>>>>>>
> >>>>>>> forums.
> >>>>
> >>>>> What I am proposing is only a name change which will make this
> >>>>>>>>
> >>>>>>> more
> >>>
> >>>> intuitive to understand. Something simple like "repeat" for "at
> >>>>>>>>
> >>>>>>> least
> >>>>
> >>>>> once", "latest" for "at most once" and "repeat latest" for
> >>>>>>>>
> >>>>>>> "exactly
> >>>
> >>>> once"
> >>>>>>
> >>>>>>> can do the trick.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >
>

Re: proposal to change names of processing modes

Posted by Munagala Ramanath <ra...@datatorrent.com>.
I agree with Vlad: these names are so deeply embedded in the community that
changing them is likely
to create more problems than it solves.

Ram

On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v....@datatorrent.com> wrote:

> I vote to keep original names and educate/explain their meaning to non
> technical audience as delivery guarantee is not specific to Apex, but has
> common meaning for all streaming platforms.
>
> Vlad
>
>
> On 2/2/16 15:17, Timothy Farkas wrote:
>
>> Could we provide Processing and Output Centric Aliases for the
>> ProcessingModes?
>>
>> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
>>
>> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
>> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
>>
>> Tim
>>
>> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>> Well output guarantees are managed by the operators themselves so the user
>>> will typically not see that as part of the engine features, they only see
>>> processing guarantees and while they are technically correct as far as
>>> individual operators are concerned the names give a different idea.
>>>
>>> Thanks
>>>
>>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
>>> wrote:
>>>
>>> I think I understand the ambiguity you are trying to clear up Pramod.
>>>> Perhaps it can be disambiguated by distinguishing between Processing
>>>> Guarantees and Output Guarantees, when explaining to people. Processing
>>>> Guarantees apply to the way tuples are transmitted between operators.
>>>> Output Guarantees apply to the way output operators write tuples to a
>>>>
>>> Data
>>>
>>>> Sink.
>>>>
>>>> This way we can describe each term intuitively in each context:
>>>>
>>>> At Most Once: A tuple can be dropped or transmitted (written) only once.
>>>> At Least Once: A tuple can be transmitted (written) one or more times.
>>>> Exactly Once: A tuple is transmitted (written) only once.
>>>>
>>>> Then we could provide a table with the strongest Output Guarantee that
>>>> is
>>>> possible for each Processing Guarantee.
>>>>
>>>> Processing          |   Strongest Output Guarantee
>>>> ----------------------------------------------
>>>> At Most Once      | At Most Once
>>>> At Least Once     | Exactly Once
>>>> Exactly Once      |  Exactly Once
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
>>>> wrote:
>>>>
>>>> I agree with Tim. Instead of new terminologies, better explanation for
>>>>>
>>>> the
>>>>
>>>>> existing once are more useful.
>>>>>
>>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com
>>>>> wrote:
>>>>>
>>>>> The idea is to disambiguate without using at least once since exactly
>>>>>>
>>>>> once
>>>>>
>>>>>> output can still be achieved with those. Any other names are fine,
>>>>>>
>>>>> those
>>>>
>>>>> were just suggestions.
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> The new names don't make as much sense to me as the original names.
>>>>>>>
>>>>>> The
>>>>
>>>>> concepts require some thought to understand, and it won't
>>>>>>>
>>>>>> necessarily
>>>
>>>> be
>>>>>
>>>>>> made easier with a name change. I think a better way to attack
>>>>>>> misunderstandings is to clearly explain what a window, operator,
>>>>>>>
>>>>>> input
>>>>
>>>>> operator, output operator, tuple, checkpoint, and DAG is with
>>>>>>>
>>>>>> really
>>>
>>>> clean
>>>>>>
>>>>>>> and simple illustrations of the concepts. Then we can explain more
>>>>>>>
>>>>>> involved
>>>>>>
>>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
>>>>>>>
>>>>>> well
>>>
>>>> thought illustrations. Without a clear explanation of the basic
>>>>>>>
>>>>>> vocabulary,
>>>>>>
>>>>>>> and without pictures, it is difficult to get even technical people
>>>>>>>
>>>>>> to
>>>
>>>> understand these concepts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>>>>>
>>>>>> pramod@datatorrent.com>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Today we support three different processing modes for operators,
>>>>>>>>
>>>>>>> "at
>>>>
>>>>> least
>>>>>>>
>>>>>>>> once", "at most once" and "exactly once" which determine tuple
>>>>>>>>
>>>>>>> processing
>>>>>>
>>>>>>> and recovery behavior when there is operator recovery from
>>>>>>>>
>>>>>>> failure.
>>>
>>>> The
>>>>>
>>>>>> default being at least once where the tuples are replayed from
>>>>>>>>
>>>>>>> the
>>>
>>>> recovered checkpoint.
>>>>>>>>
>>>>>>>> At least once works well for most applications. Typically
>>>>>>>>
>>>>>>> applications
>>>>>
>>>>>> persist the final output of processing through the DAG into
>>>>>>>>
>>>>>>> various
>>>
>>>> outputs
>>>>>>>
>>>>>>>> like key value stores, databases or even HDFS files. In many of
>>>>>>>>
>>>>>>> these
>>>>
>>>>> cases
>>>>>>>
>>>>>>>> various strategies can be employed to save the data "exactly
>>>>>>>>
>>>>>>> once"
>>>
>>>> in
>>>>
>>>>> the
>>>>>>
>>>>>>> output, such as transactions, rewinding, meta data storage,
>>>>>>>>
>>>>>>> idempotent
>>>>>
>>>>>> operations etc. Furthermore the exactly once processing mode,
>>>>>>>>
>>>>>>> which
>>>
>>>> is
>>>>>
>>>>>> a
>>>>>>
>>>>>>> checkpoint performed every window is rarely used. All this leads
>>>>>>>>
>>>>>>> to
>>>
>>>> confusion especially to somebody new and also makes it difficult
>>>>>>>>
>>>>>>> to
>>>
>>>> explain
>>>>>>>
>>>>>>>> these names to less technical audience in meetups and public
>>>>>>>>
>>>>>>> forums.
>>>>
>>>>> What I am proposing is only a name change which will make this
>>>>>>>>
>>>>>>> more
>>>
>>>> intuitive to understand. Something simple like "repeat" for "at
>>>>>>>>
>>>>>>> least
>>>>
>>>>> once", "latest" for "at most once" and "repeat latest" for
>>>>>>>>
>>>>>>> "exactly
>>>
>>>> once"
>>>>>>
>>>>>>> can do the trick.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>

Re: proposal to change names of processing modes

Posted by Vlad Rozov <v....@datatorrent.com>.
I vote to keep original names and educate/explain their meaning to non 
technical audience as delivery guarantee is not specific to Apex, but 
has common meaning for all streaming platforms.

Vlad

On 2/2/16 15:17, Timothy Farkas wrote:
> Could we provide Processing and Output Centric Aliases for the
> ProcessingModes?
>
> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
>
> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
>
> Tim
>
> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> Well output guarantees are managed by the operators themselves so the user
>> will typically not see that as part of the engine features, they only see
>> processing guarantees and while they are technically correct as far as
>> individual operators are concerned the names give a different idea.
>>
>> Thanks
>>
>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
>> wrote:
>>
>>> I think I understand the ambiguity you are trying to clear up Pramod.
>>> Perhaps it can be disambiguated by distinguishing between Processing
>>> Guarantees and Output Guarantees, when explaining to people. Processing
>>> Guarantees apply to the way tuples are transmitted between operators.
>>> Output Guarantees apply to the way output operators write tuples to a
>> Data
>>> Sink.
>>>
>>> This way we can describe each term intuitively in each context:
>>>
>>> At Most Once: A tuple can be dropped or transmitted (written) only once.
>>> At Least Once: A tuple can be transmitted (written) one or more times.
>>> Exactly Once: A tuple is transmitted (written) only once.
>>>
>>> Then we could provide a table with the strongest Output Guarantee that is
>>> possible for each Processing Guarantee.
>>>
>>> Processing          |   Strongest Output Guarantee
>>> ----------------------------------------------
>>> At Most Once      | At Most Once
>>> At Least Once     | Exactly Once
>>> Exactly Once      |  Exactly Once
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Tim
>>>
>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
>>> wrote:
>>>
>>>> I agree with Tim. Instead of new terminologies, better explanation for
>>> the
>>>> existing once are more useful.
>>>>
>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com
>>>> wrote:
>>>>
>>>>> The idea is to disambiguate without using at least once since exactly
>>>> once
>>>>> output can still be achieved with those. Any other names are fine,
>>> those
>>>>> were just suggestions.
>>>>>
>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>> The new names don't make as much sense to me as the original names.
>>> The
>>>>>> concepts require some thought to understand, and it won't
>> necessarily
>>>> be
>>>>>> made easier with a name change. I think a better way to attack
>>>>>> misunderstandings is to clearly explain what a window, operator,
>>> input
>>>>>> operator, output operator, tuple, checkpoint, and DAG is with
>> really
>>>>> clean
>>>>>> and simple illustrations of the concepts. Then we can explain more
>>>>> involved
>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
>> well
>>>>>> thought illustrations. Without a clear explanation of the basic
>>>>> vocabulary,
>>>>>> and without pictures, it is difficult to get even technical people
>> to
>>>>>> understand these concepts.
>>>>>>
>>>>>> Thanks,
>>>>>> Tim
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>> pramod@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Today we support three different processing modes for operators,
>>> "at
>>>>>> least
>>>>>>> once", "at most once" and "exactly once" which determine tuple
>>>>> processing
>>>>>>> and recovery behavior when there is operator recovery from
>> failure.
>>>> The
>>>>>>> default being at least once where the tuples are replayed from
>> the
>>>>>>> recovered checkpoint.
>>>>>>>
>>>>>>> At least once works well for most applications. Typically
>>>> applications
>>>>>>> persist the final output of processing through the DAG into
>> various
>>>>>> outputs
>>>>>>> like key value stores, databases or even HDFS files. In many of
>>> these
>>>>>> cases
>>>>>>> various strategies can be employed to save the data "exactly
>> once"
>>> in
>>>>> the
>>>>>>> output, such as transactions, rewinding, meta data storage,
>>>> idempotent
>>>>>>> operations etc. Furthermore the exactly once processing mode,
>> which
>>>> is
>>>>> a
>>>>>>> checkpoint performed every window is rarely used. All this leads
>> to
>>>>>>> confusion especially to somebody new and also makes it difficult
>> to
>>>>>> explain
>>>>>>> these names to less technical audience in meetups and public
>>> forums.
>>>>>>> What I am proposing is only a name change which will make this
>> more
>>>>>>> intuitive to understand. Something simple like "repeat" for "at
>>> least
>>>>>>> once", "latest" for "at most once" and "repeat latest" for
>> "exactly
>>>>> once"
>>>>>>> can do the trick.
>>>>>>>
>>>>>>> Thanks
>>>>>>>


Re: proposal to change names of processing modes

Posted by Timothy Farkas <ti...@datatorrent.com>.
Could we provide Processing and Output Centric Aliases for the
ProcessingModes?

ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE

ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE

Tim

On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Well output guarantees are managed by the operators themselves so the user
> will typically not see that as part of the engine features, they only see
> processing guarantees and while they are technically correct as far as
> individual operators are concerned the names give a different idea.
>
> Thanks
>
> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
>
> > I think I understand the ambiguity you are trying to clear up Pramod.
> > Perhaps it can be disambiguated by distinguishing between Processing
> > Guarantees and Output Guarantees, when explaining to people. Processing
> > Guarantees apply to the way tuples are transmitted between operators.
> > Output Guarantees apply to the way output operators write tuples to a
> Data
> > Sink.
> >
> > This way we can describe each term intuitively in each context:
> >
> > At Most Once: A tuple can be dropped or transmitted (written) only once.
> > At Least Once: A tuple can be transmitted (written) one or more times.
> > Exactly Once: A tuple is transmitted (written) only once.
> >
> > Then we could provide a table with the strongest Output Guarantee that is
> > possible for each Processing Guarantee.
> >
> > Processing          |   Strongest Output Guarantee
> > ----------------------------------------------
> > At Most Once      | At Most Once
> > At Least Once     | Exactly Once
> > Exactly Once      |  Exactly Once
> >
> > Thoughts?
> >
> > Thanks,
> > Tim
> >
> > On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
> > wrote:
> >
> > > I agree with Tim. Instead of new terminologies, better explanation for
> > the
> > > existing once are more useful.
> > >
> > > On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com
> >
> > > wrote:
> > >
> > > > The idea is to disambiguate without using at least once since exactly
> > > once
> > > > output can still be achieved with those. Any other names are fine,
> > those
> > > > were just suggestions.
> > > >
> > > > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > The new names don't make as much sense to me as the original names.
> > The
> > > > > concepts require some thought to understand, and it won't
> necessarily
> > > be
> > > > > made easier with a name change. I think a better way to attack
> > > > > misunderstandings is to clearly explain what a window, operator,
> > input
> > > > > operator, output operator, tuple, checkpoint, and DAG is with
> really
> > > > clean
> > > > > and simple illustrations of the concepts. Then we can explain more
> > > > involved
> > > > > concepts like At Least Once, At Most Once, and Exactly Once with
> well
> > > > > thought illustrations. Without a clear explanation of the basic
> > > > vocabulary,
> > > > > and without pictures, it is difficult to get even technical people
> to
> > > > > understand these concepts.
> > > > >
> > > > > Thanks,
> > > > > Tim
> > > > >
> > > > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > pramod@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > Today we support three different processing modes for operators,
> > "at
> > > > > least
> > > > > > once", "at most once" and "exactly once" which determine tuple
> > > > processing
> > > > > > and recovery behavior when there is operator recovery from
> failure.
> > > The
> > > > > > default being at least once where the tuples are replayed from
> the
> > > > > > recovered checkpoint.
> > > > > >
> > > > > > At least once works well for most applications. Typically
> > > applications
> > > > > > persist the final output of processing through the DAG into
> various
> > > > > outputs
> > > > > > like key value stores, databases or even HDFS files. In many of
> > these
> > > > > cases
> > > > > > various strategies can be employed to save the data "exactly
> once"
> > in
> > > > the
> > > > > > output, such as transactions, rewinding, meta data storage,
> > > idempotent
> > > > > > operations etc. Furthermore the exactly once processing mode,
> which
> > > is
> > > > a
> > > > > > checkpoint performed every window is rarely used. All this leads
> to
> > > > > > confusion especially to somebody new and also makes it difficult
> to
> > > > > explain
> > > > > > these names to less technical audience in meetups and public
> > forums.
> > > > > >
> > > > > > What I am proposing is only a name change which will make this
> more
> > > > > > intuitive to understand. Something simple like "repeat" for "at
> > least
> > > > > > once", "latest" for "at most once" and "repeat latest" for
> "exactly
> > > > once"
> > > > > > can do the trick.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Pramod Immaneni <pr...@datatorrent.com>.
Well output guarantees are managed by the operators themselves so the user
will typically not see that as part of the engine features, they only see
processing guarantees and while they are technically correct as far as
individual operators are concerned the names give a different idea.

Thanks

On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <ti...@datatorrent.com> wrote:

> I think I understand the ambiguity you are trying to clear up Pramod.
> Perhaps it can be disambiguated by distinguishing between Processing
> Guarantees and Output Guarantees, when explaining to people. Processing
> Guarantees apply to the way tuples are transmitted between operators.
> Output Guarantees apply to the way output operators write tuples to a Data
> Sink.
>
> This way we can describe each term intuitively in each context:
>
> At Most Once: A tuple can be dropped or transmitted (written) only once.
> At Least Once: A tuple can be transmitted (written) one or more times.
> Exactly Once: A tuple is transmitted (written) only once.
>
> Then we could provide a table with the strongest Output Guarantee that is
> possible for each Processing Guarantee.
>
> Processing          |   Strongest Output Guarantee
> ----------------------------------------------
> At Most Once      | At Most Once
> At Least Once     | Exactly Once
> Exactly Once      |  Exactly Once
>
> Thoughts?
>
> Thanks,
> Tim
>
> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
> wrote:
>
> > I agree with Tim. Instead of new terminologies, better explanation for
> the
> > existing once are more useful.
> >
> > On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pr...@datatorrent.com>
> > wrote:
> >
> > > The idea is to disambiguate without using at least once since exactly
> > once
> > > output can still be achieved with those. Any other names are fine,
> those
> > > were just suggestions.
> > >
> > > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
> > > wrote:
> > >
> > > > The new names don't make as much sense to me as the original names.
> The
> > > > concepts require some thought to understand, and it won't necessarily
> > be
> > > > made easier with a name change. I think a better way to attack
> > > > misunderstandings is to clearly explain what a window, operator,
> input
> > > > operator, output operator, tuple, checkpoint, and DAG is with really
> > > clean
> > > > and simple illustrations of the concepts. Then we can explain more
> > > involved
> > > > concepts like At Least Once, At Most Once, and Exactly Once with well
> > > > thought illustrations. Without a clear explanation of the basic
> > > vocabulary,
> > > > and without pictures, it is difficult to get even technical people to
> > > > understand these concepts.
> > > >
> > > > Thanks,
> > > > Tim
> > > >
> > > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Today we support three different processing modes for operators,
> "at
> > > > least
> > > > > once", "at most once" and "exactly once" which determine tuple
> > > processing
> > > > > and recovery behavior when there is operator recovery from failure.
> > The
> > > > > default being at least once where the tuples are replayed from the
> > > > > recovered checkpoint.
> > > > >
> > > > > At least once works well for most applications. Typically
> > applications
> > > > > persist the final output of processing through the DAG into various
> > > > outputs
> > > > > like key value stores, databases or even HDFS files. In many of
> these
> > > > cases
> > > > > various strategies can be employed to save the data "exactly once"
> in
> > > the
> > > > > output, such as transactions, rewinding, meta data storage,
> > idempotent
> > > > > operations etc. Furthermore the exactly once processing mode, which
> > is
> > > a
> > > > > checkpoint performed every window is rarely used. All this leads to
> > > > > confusion especially to somebody new and also makes it difficult to
> > > > explain
> > > > > these names to less technical audience in meetups and public
> forums.
> > > > >
> > > > > What I am proposing is only a name change which will make this more
> > > > > intuitive to understand. Something simple like "repeat" for "at
> least
> > > > > once", "latest" for "at most once" and "repeat latest" for "exactly
> > > once"
> > > > > can do the trick.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Timothy Farkas <ti...@datatorrent.com>.
I think I understand the ambiguity you are trying to clear up Pramod.
Perhaps it can be disambiguated by distinguishing between Processing
Guarantees and Output Guarantees, when explaining to people. Processing
Guarantees apply to the way tuples are transmitted between operators.
Output Guarantees apply to the way output operators write tuples to a Data
Sink.

This way we can describe each term intuitively in each context:

At Most Once: A tuple can be dropped or transmitted (written) only once.
At Least Once: A tuple can be transmitted (written) one or more times.
Exactly Once: A tuple is transmitted (written) only once.

Then we could provide a table with the strongest Output Guarantee that is
possible for each Processing Guarantee.

Processing          |   Strongest Output Guarantee
----------------------------------------------
At Most Once      | At Most Once
At Least Once     | Exactly Once
Exactly Once      |  Exactly Once

Thoughts?

Thanks,
Tim

On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sa...@datatorrent.com>
wrote:

> I agree with Tim. Instead of new terminologies, better explanation for the
> existing once are more useful.
>
> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > The idea is to disambiguate without using at least once since exactly
> once
> > output can still be achieved with those. Any other names are fine, those
> > were just suggestions.
> >
> > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
> > wrote:
> >
> > > The new names don't make as much sense to me as the original names. The
> > > concepts require some thought to understand, and it won't necessarily
> be
> > > made easier with a name change. I think a better way to attack
> > > misunderstandings is to clearly explain what a window, operator, input
> > > operator, output operator, tuple, checkpoint, and DAG is with really
> > clean
> > > and simple illustrations of the concepts. Then we can explain more
> > involved
> > > concepts like At Least Once, At Most Once, and Exactly Once with well
> > > thought illustrations. Without a clear explanation of the basic
> > vocabulary,
> > > and without pictures, it is difficult to get even technical people to
> > > understand these concepts.
> > >
> > > Thanks,
> > > Tim
> > >
> > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> pramod@datatorrent.com>
> > > wrote:
> > >
> > > > Today we support three different processing modes for operators, "at
> > > least
> > > > once", "at most once" and "exactly once" which determine tuple
> > processing
> > > > and recovery behavior when there is operator recovery from failure.
> The
> > > > default being at least once where the tuples are replayed from the
> > > > recovered checkpoint.
> > > >
> > > > At least once works well for most applications. Typically
> applications
> > > > persist the final output of processing through the DAG into various
> > > outputs
> > > > like key value stores, databases or even HDFS files. In many of these
> > > cases
> > > > various strategies can be employed to save the data "exactly once" in
> > the
> > > > output, such as transactions, rewinding, meta data storage,
> idempotent
> > > > operations etc. Furthermore the exactly once processing mode, which
> is
> > a
> > > > checkpoint performed every window is rarely used. All this leads to
> > > > confusion especially to somebody new and also makes it difficult to
> > > explain
> > > > these names to less technical audience in meetups and public forums.
> > > >
> > > > What I am proposing is only a name change which will make this more
> > > > intuitive to understand. Something simple like "repeat" for "at least
> > > > once", "latest" for "at most once" and "repeat latest" for "exactly
> > once"
> > > > can do the trick.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Sandesh Hegde <sa...@datatorrent.com>.
I agree with Tim. Instead of new terminologies, better explanation for the
existing once are more useful.

On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pr...@datatorrent.com>
wrote:

> The idea is to disambiguate without using at least once since exactly once
> output can still be achieved with those. Any other names are fine, those
> were just suggestions.
>
> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com>
> wrote:
>
> > The new names don't make as much sense to me as the original names. The
> > concepts require some thought to understand, and it won't necessarily be
> > made easier with a name change. I think a better way to attack
> > misunderstandings is to clearly explain what a window, operator, input
> > operator, output operator, tuple, checkpoint, and DAG is with really
> clean
> > and simple illustrations of the concepts. Then we can explain more
> involved
> > concepts like At Least Once, At Most Once, and Exactly Once with well
> > thought illustrations. Without a clear explanation of the basic
> vocabulary,
> > and without pictures, it is difficult to get even technical people to
> > understand these concepts.
> >
> > Thanks,
> > Tim
> >
> > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <pr...@datatorrent.com>
> > wrote:
> >
> > > Today we support three different processing modes for operators, "at
> > least
> > > once", "at most once" and "exactly once" which determine tuple
> processing
> > > and recovery behavior when there is operator recovery from failure. The
> > > default being at least once where the tuples are replayed from the
> > > recovered checkpoint.
> > >
> > > At least once works well for most applications. Typically applications
> > > persist the final output of processing through the DAG into various
> > outputs
> > > like key value stores, databases or even HDFS files. In many of these
> > cases
> > > various strategies can be employed to save the data "exactly once" in
> the
> > > output, such as transactions, rewinding, meta data storage, idempotent
> > > operations etc. Furthermore the exactly once processing mode, which is
> a
> > > checkpoint performed every window is rarely used. All this leads to
> > > confusion especially to somebody new and also makes it difficult to
> > explain
> > > these names to less technical audience in meetups and public forums.
> > >
> > > What I am proposing is only a name change which will make this more
> > > intuitive to understand. Something simple like "repeat" for "at least
> > > once", "latest" for "at most once" and "repeat latest" for "exactly
> once"
> > > can do the trick.
> > >
> > > Thanks
> > >
> >
>

Re: proposal to change names of processing modes

Posted by Pramod Immaneni <pr...@datatorrent.com>.
The idea is to disambiguate without using at least once since exactly once
output can still be achieved with those. Any other names are fine, those
were just suggestions.

On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <ti...@datatorrent.com> wrote:

> The new names don't make as much sense to me as the original names. The
> concepts require some thought to understand, and it won't necessarily be
> made easier with a name change. I think a better way to attack
> misunderstandings is to clearly explain what a window, operator, input
> operator, output operator, tuple, checkpoint, and DAG is with really clean
> and simple illustrations of the concepts. Then we can explain more involved
> concepts like At Least Once, At Most Once, and Exactly Once with well
> thought illustrations. Without a clear explanation of the basic vocabulary,
> and without pictures, it is difficult to get even technical people to
> understand these concepts.
>
> Thanks,
> Tim
>
> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > Today we support three different processing modes for operators, "at
> least
> > once", "at most once" and "exactly once" which determine tuple processing
> > and recovery behavior when there is operator recovery from failure. The
> > default being at least once where the tuples are replayed from the
> > recovered checkpoint.
> >
> > At least once works well for most applications. Typically applications
> > persist the final output of processing through the DAG into various
> outputs
> > like key value stores, databases or even HDFS files. In many of these
> cases
> > various strategies can be employed to save the data "exactly once" in the
> > output, such as transactions, rewinding, meta data storage, idempotent
> > operations etc. Furthermore the exactly once processing mode, which is a
> > checkpoint performed every window is rarely used. All this leads to
> > confusion especially to somebody new and also makes it difficult to
> explain
> > these names to less technical audience in meetups and public forums.
> >
> > What I am proposing is only a name change which will make this more
> > intuitive to understand. Something simple like "repeat" for "at least
> > once", "latest" for "at most once" and "repeat latest" for "exactly once"
> > can do the trick.
> >
> > Thanks
> >
>

Re: proposal to change names of processing modes

Posted by Timothy Farkas <ti...@datatorrent.com>.
The new names don't make as much sense to me as the original names. The
concepts require some thought to understand, and it won't necessarily be
made easier with a name change. I think a better way to attack
misunderstandings is to clearly explain what a window, operator, input
operator, output operator, tuple, checkpoint, and DAG is with really clean
and simple illustrations of the concepts. Then we can explain more involved
concepts like At Least Once, At Most Once, and Exactly Once with well
thought illustrations. Without a clear explanation of the basic vocabulary,
and without pictures, it is difficult to get even technical people to
understand these concepts.

Thanks,
Tim

On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Today we support three different processing modes for operators, "at least
> once", "at most once" and "exactly once" which determine tuple processing
> and recovery behavior when there is operator recovery from failure. The
> default being at least once where the tuples are replayed from the
> recovered checkpoint.
>
> At least once works well for most applications. Typically applications
> persist the final output of processing through the DAG into various outputs
> like key value stores, databases or even HDFS files. In many of these cases
> various strategies can be employed to save the data "exactly once" in the
> output, such as transactions, rewinding, meta data storage, idempotent
> operations etc. Furthermore the exactly once processing mode, which is a
> checkpoint performed every window is rarely used. All this leads to
> confusion especially to somebody new and also makes it difficult to explain
> these names to less technical audience in meetups and public forums.
>
> What I am proposing is only a name change which will make this more
> intuitive to understand. Something simple like "repeat" for "at least
> once", "latest" for "at most once" and "repeat latest" for "exactly once"
> can do the trick.
>
> Thanks
>