You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2019/11/05 13:10:16 UTC

[VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Hi,

I'd like to open a vote on accepting design document [1] as a base for 
implementation of @RequiresTimeSortedInput annotation for stateful 
DoFns. Associated JIRA [2] and PR [3] contains only subset of the whole 
functionality (allowed lateness ignored and no possibility to specify 
UDF for time - or sequential number - to be extracted from data). The PR 
will be subject to independent review process (please feel free to 
self-request review if you are interested in this) after the vote would 
eventually succeed. Missing features from the design document will be 
added later in subsequent JIRA issues, so that it doesn't block 
availability of this feature.

Please vote on adding support for @RequiresTimeSortedInput.

The vote is open for the next 72 hours and passes if at least three +1 
and no -1 PMC (binding) votes are cast.

[ ] +1 Add support for @RequiresTimeSortedInput

[ ] 0 I don't have a strong opinion about this, but I assume it's ok

[ ] -1 Do not support @RequiresTimeSortedInput - please provide explanation.

Thanks,

  Jan

[1] 
https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing

[2] https://issues.apache.org/jira/browse/BEAM-8550

[3] https://github.com/apache/beam/pull/8774


Re: [CANCELLED] [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Kenneth Knowles <ke...@apache.org>.
Hi Jan,

I want to acknowledge your careful consideration of the community here.

I myself have simply not had the time to dedicate to considering this
proposal. So, like Max, I would have a bit of an "outside" perspective so
would hesitate to cast any sort of vote.

I think you have chosen a good course of action - continue to grow
awareness and understanding of the problem / awareness and understanding of
the solution.

Kenn

On Tue, Nov 12, 2019 at 12:45 AM Jan Lukavský <je...@seznam.cz> wrote:

> I'm cancelling this due to lack of activity. I will issue a follow-up
> thread to find solution.
>
> On 11/9/19 11:45 AM, Jan Lukavský wrote:
> > Hi,
> >
> > I'll try to summarize the mailing list threads to clarify why I think
> > this addition is needed (and actually necessary):
> >
> >  a) there are situations where the order of input events matter
> > (obviously any finite state machine)
> >
> >  b) in streaming case, this can be handled by the current machinery
> > (e.g. holding elements in state, sorting all elements with timestamp
> > less than input watermark, dropping latecomers)
> >
> >  c) in batch case, this can be handled the same way, but
> >
> >   i) due to the nature of batch processing, that has extreme
> > requirements on the size of state needed to hold the elements
> > (actually, in extreme, that might be the whole input, which might not
> > be feasible)
> >
> >   ii) although it is true, that watermark might (and will) fall behind
> > in streaming processing as well so that similar issues might arise
> > there too, it is hardly imaginable that it will fall behind as much as
> > several years (but it is absolutely natural in batch case) - I'm
> > talking about regular streaming processing, not some kappa like
> > architectures, where this happens as well, but is causes troubles ([1])
> >
> >   iii) given the fact, that some runners already use sort-merge
> > groupings, it is actually virtually for free to also sort elements
> > inside groups by timestamps, the runner just has to know, that it
> > should do so
> >
> > I don't want to go too far into details to keep this focused, but the
> > fact that runner would know that it should sort by timestamp before
> > stateful pardo brings additional features that are currently
> > unavailable - e.g. actually shift event time smoothly, as elements
> > flow through, not from -inf to +inf in one shot. That might have
> > positive effect on timers being fired smoothly and thus for instance
> > being able to free some state that would have to be held until the end
> > of computation otherwise.
> >
> > Therefore, I think it is essential for users to be able to tell runner
> > that a particular stateful pardo depends on order of input events, so
> > that the runner can use optimizations available in batch case. The
> > streaming case is mostly unaffected by that, because all the sorting
> > can be handled the usual way.
> >
> > Hope this helps to clarify why it would be good to introduce (some
> > way) to mark stateful pardos as "time sorted".
> >
> > Cheers,
> >
> >  Jan
> >
> > [1]
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/moving-from-lambda-and-kappa-architectures-to-kappa-at-uber
> >
> > Hope these thoughts help
> >
> > On 11/8/19 11:35 AM, Jan Lukavský wrote:
> >> Hi Max,
> >>
> >> thanks for comment. I probably should have put links to discussion
> >> threads here in the vote thread. Relevant would be
> >>
> >>  - (a pretty lengthy) discussion about whether sorting by timestamp
> >> should be part of the model - [1]
> >>
> >>  - part of the discussion related to the annotation - [2]
> >>
> >> Regarding the open question in the design document - these are not
> >> meant to be open questions in regard to the design of the annotation
> >> and I'll remove that for now, as it is not (directly) related.
> >>
> >> Now - main reason for this vote is that there is actually not a clear
> >> consensus in the ML thread. There are plenty of words like "should",
> >> "could", "would" and "maybe", so I wanted to be sure there is
> >> consensus to include this. I already run this in production for
> >> several months, so it is definitely useful for me. :-) But that might
> >> not be sufficient.
> >>
> >> I'd be very happy to answer any more questions.
> >>
> >> Thanks,
> >>
> >>  Jan
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E
> >>
> >> [2]
> >>
> https://lists.apache.org/thread.html/dd9bec903102d9fcb4f390dc01513c0921eac1fedd8bcfdac630aaee@%3Cdev.beam.apache.org%3E
> >>
> >> On 11/8/19 11:08 AM, Maximilian Michels wrote:
> >>> Hi Jan,
> >>>
> >>> Disclaimer: I haven't followed the discussion closely, so I do not
> >>> want to comment on the technical details of the feature here.
> >>>
> >>> From the outside, it looks like there may be open questions. Also,
> >>> we may need more motivation for what we can build with this feature
> >>> or how it will become useful to users.
> >>>
> >>> There are many threads in Beam and I believe we need to carefully
> >>> prioritize the Beam feature set in order to focus on the things that
> >>> provide the most value to our users.
> >>>
> >>> Cheers,
> >>> Max
> >>>
> >>> On 07.11.19 15:55, Jan Lukavský wrote:
> >>>> Hi,
> >>>> is there anything I can do to make this more attractive? :-) Any
> >>>> feedback would be much appreciated.
> >>>> Many thanks,
> >>>>   Jan
> >>>>
> >>>> Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:
> >>>>
> >>>>     Hi,
> >>>>
> >>>>     I'd like to open a vote on accepting design document [1] as a
> >>>> base for
> >>>>     implementation of @RequiresTimeSortedInput annotation for stateful
> >>>>     DoFns. Associated JIRA [2] and PR [3] contains only subset of
> >>>> the whole
> >>>>     functionality (allowed lateness ignored and no possibility to
> >>>> specify
> >>>>     UDF for time - or sequential number - to be extracted from data).
> >>>>     The PR
> >>>>     will be subject to independent review process (please feel free to
> >>>>     self-request review if you are interested in this) after the
> >>>> vote would
> >>>>     eventually succeed. Missing features from the design document
> >>>> will be
> >>>>     added later in subsequent JIRA issues, so that it doesn't block
> >>>>     availability of this feature.
> >>>>
> >>>>     Please vote on adding support for @RequiresTimeSortedInput.
> >>>>
> >>>>     The vote is open for the next 72 hours and passes if at least
> >>>> three +1
> >>>>     and no -1 PMC (binding) votes are cast.
> >>>>
> >>>>     [ ] +1 Add support for @RequiresTimeSortedInput
> >>>>
> >>>>     [ ] 0 I don't have a strong opinion about this, but I assume
> >>>> it's ok
> >>>>
> >>>>     [ ] -1 Do not support @RequiresTimeSortedInput - please provide
> >>>>     explanation.
> >>>>
> >>>>     Thanks,
> >>>>
> >>>>       Jan
> >>>>
> >>>>     [1]
> >>>>
> https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing
> >>>>
> >>>>
> >>>>
> >>>>     [2] https://issues.apache.org/jira/browse/BEAM-8550
> >>>>
> >>>>     [3] https://github.com/apache/beam/pull/8774
> >>>>
> >>>>
>

[CANCELLED] [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Jan Lukavský <je...@seznam.cz>.
I'm cancelling this due to lack of activity. I will issue a follow-up 
thread to find solution.

On 11/9/19 11:45 AM, Jan Lukavský wrote:
> Hi,
>
> I'll try to summarize the mailing list threads to clarify why I think 
> this addition is needed (and actually necessary):
>
>  a) there are situations where the order of input events matter 
> (obviously any finite state machine)
>
>  b) in streaming case, this can be handled by the current machinery 
> (e.g. holding elements in state, sorting all elements with timestamp 
> less than input watermark, dropping latecomers)
>
>  c) in batch case, this can be handled the same way, but
>
>   i) due to the nature of batch processing, that has extreme 
> requirements on the size of state needed to hold the elements 
> (actually, in extreme, that might be the whole input, which might not 
> be feasible)
>
>   ii) although it is true, that watermark might (and will) fall behind 
> in streaming processing as well so that similar issues might arise 
> there too, it is hardly imaginable that it will fall behind as much as 
> several years (but it is absolutely natural in batch case) - I'm 
> talking about regular streaming processing, not some kappa like 
> architectures, where this happens as well, but is causes troubles ([1])
>
>   iii) given the fact, that some runners already use sort-merge 
> groupings, it is actually virtually for free to also sort elements 
> inside groups by timestamps, the runner just has to know, that it 
> should do so
>
> I don't want to go too far into details to keep this focused, but the 
> fact that runner would know that it should sort by timestamp before 
> stateful pardo brings additional features that are currently 
> unavailable - e.g. actually shift event time smoothly, as elements 
> flow through, not from -inf to +inf in one shot. That might have 
> positive effect on timers being fired smoothly and thus for instance 
> being able to free some state that would have to be held until the end 
> of computation otherwise.
>
> Therefore, I think it is essential for users to be able to tell runner 
> that a particular stateful pardo depends on order of input events, so 
> that the runner can use optimizations available in batch case. The 
> streaming case is mostly unaffected by that, because all the sorting 
> can be handled the usual way.
>
> Hope this helps to clarify why it would be good to introduce (some 
> way) to mark stateful pardos as "time sorted".
>
> Cheers,
>
>  Jan
>
> [1] 
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/moving-from-lambda-and-kappa-architectures-to-kappa-at-uber
>
> Hope these thoughts help
>
> On 11/8/19 11:35 AM, Jan Lukavský wrote:
>> Hi Max,
>>
>> thanks for comment. I probably should have put links to discussion 
>> threads here in the vote thread. Relevant would be
>>
>>  - (a pretty lengthy) discussion about whether sorting by timestamp 
>> should be part of the model - [1]
>>
>>  - part of the discussion related to the annotation - [2]
>>
>> Regarding the open question in the design document - these are not 
>> meant to be open questions in regard to the design of the annotation 
>> and I'll remove that for now, as it is not (directly) related.
>>
>> Now - main reason for this vote is that there is actually not a clear 
>> consensus in the ML thread. There are plenty of words like "should", 
>> "could", "would" and "maybe", so I wanted to be sure there is 
>> consensus to include this. I already run this in production for 
>> several months, so it is definitely useful for me. :-) But that might 
>> not be sufficient.
>>
>> I'd be very happy to answer any more questions.
>>
>> Thanks,
>>
>>  Jan
>>
>> [1] 
>> https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E
>>
>> [2] 
>> https://lists.apache.org/thread.html/dd9bec903102d9fcb4f390dc01513c0921eac1fedd8bcfdac630aaee@%3Cdev.beam.apache.org%3E
>>
>> On 11/8/19 11:08 AM, Maximilian Michels wrote:
>>> Hi Jan,
>>>
>>> Disclaimer: I haven't followed the discussion closely, so I do not 
>>> want to comment on the technical details of the feature here.
>>>
>>> From the outside, it looks like there may be open questions. Also, 
>>> we may need more motivation for what we can build with this feature 
>>> or how it will become useful to users.
>>>
>>> There are many threads in Beam and I believe we need to carefully 
>>> prioritize the Beam feature set in order to focus on the things that 
>>> provide the most value to our users.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 07.11.19 15:55, Jan Lukavský wrote:
>>>> Hi,
>>>> is there anything I can do to make this more attractive? :-) Any 
>>>> feedback would be much appreciated.
>>>> Many thanks,
>>>>   Jan
>>>>
>>>> Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:
>>>>
>>>>     Hi,
>>>>
>>>>     I'd like to open a vote on accepting design document [1] as a 
>>>> base for
>>>>     implementation of @RequiresTimeSortedInput annotation for stateful
>>>>     DoFns. Associated JIRA [2] and PR [3] contains only subset of 
>>>> the whole
>>>>     functionality (allowed lateness ignored and no possibility to 
>>>> specify
>>>>     UDF for time - or sequential number - to be extracted from data).
>>>>     The PR
>>>>     will be subject to independent review process (please feel free to
>>>>     self-request review if you are interested in this) after the 
>>>> vote would
>>>>     eventually succeed. Missing features from the design document 
>>>> will be
>>>>     added later in subsequent JIRA issues, so that it doesn't block
>>>>     availability of this feature.
>>>>
>>>>     Please vote on adding support for @RequiresTimeSortedInput.
>>>>
>>>>     The vote is open for the next 72 hours and passes if at least 
>>>> three +1
>>>>     and no -1 PMC (binding) votes are cast.
>>>>
>>>>     [ ] +1 Add support for @RequiresTimeSortedInput
>>>>
>>>>     [ ] 0 I don't have a strong opinion about this, but I assume 
>>>> it's ok
>>>>
>>>>     [ ] -1 Do not support @RequiresTimeSortedInput - please provide
>>>>     explanation.
>>>>
>>>>     Thanks,
>>>>
>>>>       Jan
>>>>
>>>>     [1]
>>>> https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing 
>>>>
>>>>
>>>>
>>>>     [2] https://issues.apache.org/jira/browse/BEAM-8550
>>>>
>>>>     [3] https://github.com/apache/beam/pull/8774
>>>>
>>>>

Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Jan Lukavský <je...@seznam.cz>.
Hi,

I'll try to summarize the mailing list threads to clarify why I think 
this addition is needed (and actually necessary):

  a) there are situations where the order of input events matter 
(obviously any finite state machine)

  b) in streaming case, this can be handled by the current machinery 
(e.g. holding elements in state, sorting all elements with timestamp 
less than input watermark, dropping latecomers)

  c) in batch case, this can be handled the same way, but

   i) due to the nature of batch processing, that has extreme 
requirements on the size of state needed to hold the elements (actually, 
in extreme, that might be the whole input, which might not be feasible)

   ii) although it is true, that watermark might (and will) fall behind 
in streaming processing as well so that similar issues might arise there 
too, it is hardly imaginable that it will fall behind as much as several 
years (but it is absolutely natural in batch case) - I'm talking about 
regular streaming processing, not some kappa like architectures, where 
this happens as well, but is causes troubles ([1])

   iii) given the fact, that some runners already use sort-merge 
groupings, it is actually virtually for free to also sort elements 
inside groups by timestamps, the runner just has to know, that it should 
do so

I don't want to go too far into details to keep this focused, but the 
fact that runner would know that it should sort by timestamp before 
stateful pardo brings additional features that are currently unavailable 
- e.g. actually shift event time smoothly, as elements flow through, not 
from -inf to +inf in one shot. That might have positive effect on timers 
being fired smoothly and thus for instance being able to free some state 
that would have to be held until the end of computation otherwise.

Therefore, I think it is essential for users to be able to tell runner 
that a particular stateful pardo depends on order of input events, so 
that the runner can use optimizations available in batch case. The 
streaming case is mostly unaffected by that, because all the sorting can 
be handled the usual way.

Hope this helps to clarify why it would be good to introduce (some way) 
to mark stateful pardos as "time sorted".

Cheers,

  Jan

[1] 
https://www.ververica.com/resources/flink-forward-san-francisco-2019/moving-from-lambda-and-kappa-architectures-to-kappa-at-uber

Hope these thoughts help

On 11/8/19 11:35 AM, Jan Lukavský wrote:
> Hi Max,
>
> thanks for comment. I probably should have put links to discussion 
> threads here in the vote thread. Relevant would be
>
>  - (a pretty lengthy) discussion about whether sorting by timestamp 
> should be part of the model - [1]
>
>  - part of the discussion related to the annotation - [2]
>
> Regarding the open question in the design document - these are not 
> meant to be open questions in regard to the design of the annotation 
> and I'll remove that for now, as it is not (directly) related.
>
> Now - main reason for this vote is that there is actually not a clear 
> consensus in the ML thread. There are plenty of words like "should", 
> "could", "would" and "maybe", so I wanted to be sure there is 
> consensus to include this. I already run this in production for 
> several months, so it is definitely useful for me. :-) But that might 
> not be sufficient.
>
> I'd be very happy to answer any more questions.
>
> Thanks,
>
>  Jan
>
> [1] 
> https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E
>
> [2] 
> https://lists.apache.org/thread.html/dd9bec903102d9fcb4f390dc01513c0921eac1fedd8bcfdac630aaee@%3Cdev.beam.apache.org%3E
>
> On 11/8/19 11:08 AM, Maximilian Michels wrote:
>> Hi Jan,
>>
>> Disclaimer: I haven't followed the discussion closely, so I do not 
>> want to comment on the technical details of the feature here.
>>
>> From the outside, it looks like there may be open questions. Also, we 
>> may need more motivation for what we can build with this feature or 
>> how it will become useful to users.
>>
>> There are many threads in Beam and I believe we need to carefully 
>> prioritize the Beam feature set in order to focus on the things that 
>> provide the most value to our users.
>>
>> Cheers,
>> Max
>>
>> On 07.11.19 15:55, Jan Lukavský wrote:
>>> Hi,
>>> is there anything I can do to make this more attractive? :-) Any 
>>> feedback would be much appreciated.
>>> Many thanks,
>>>   Jan
>>>
>>> Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:
>>>
>>>     Hi,
>>>
>>>     I'd like to open a vote on accepting design document [1] as a 
>>> base for
>>>     implementation of @RequiresTimeSortedInput annotation for stateful
>>>     DoFns. Associated JIRA [2] and PR [3] contains only subset of 
>>> the whole
>>>     functionality (allowed lateness ignored and no possibility to 
>>> specify
>>>     UDF for time - or sequential number - to be extracted from data).
>>>     The PR
>>>     will be subject to independent review process (please feel free to
>>>     self-request review if you are interested in this) after the 
>>> vote would
>>>     eventually succeed. Missing features from the design document 
>>> will be
>>>     added later in subsequent JIRA issues, so that it doesn't block
>>>     availability of this feature.
>>>
>>>     Please vote on adding support for @RequiresTimeSortedInput.
>>>
>>>     The vote is open for the next 72 hours and passes if at least 
>>> three +1
>>>     and no -1 PMC (binding) votes are cast.
>>>
>>>     [ ] +1 Add support for @RequiresTimeSortedInput
>>>
>>>     [ ] 0 I don't have a strong opinion about this, but I assume 
>>> it's ok
>>>
>>>     [ ] -1 Do not support @RequiresTimeSortedInput - please provide
>>>     explanation.
>>>
>>>     Thanks,
>>>
>>>       Jan
>>>
>>>     [1]
>>> https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing 
>>>
>>>
>>>
>>>     [2] https://issues.apache.org/jira/browse/BEAM-8550
>>>
>>>     [3] https://github.com/apache/beam/pull/8774
>>>
>>>

Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Max,

thanks for comment. I probably should have put links to discussion 
threads here in the vote thread. Relevant would be

  - (a pretty lengthy) discussion about whether sorting by timestamp 
should be part of the model - [1]

  - part of the discussion related to the annotation - [2]

Regarding the open question in the design document - these are not meant 
to be open questions in regard to the design of the annotation and I'll 
remove that for now, as it is not (directly) related.

Now - main reason for this vote is that there is actually not a clear 
consensus in the ML thread. There are plenty of words like "should", 
"could", "would" and "maybe", so I wanted to be sure there is consensus 
to include this. I already run this in production for several months, so 
it is definitely useful for me. :-) But that might not be sufficient.

I'd be very happy to answer any more questions.

Thanks,

  Jan

[1] 
https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E

[2] 
https://lists.apache.org/thread.html/dd9bec903102d9fcb4f390dc01513c0921eac1fedd8bcfdac630aaee@%3Cdev.beam.apache.org%3E

On 11/8/19 11:08 AM, Maximilian Michels wrote:
> Hi Jan,
>
> Disclaimer: I haven't followed the discussion closely, so I do not 
> want to comment on the technical details of the feature here.
>
> From the outside, it looks like there may be open questions. Also, we 
> may need more motivation for what we can build with this feature or 
> how it will become useful to users.
>
> There are many threads in Beam and I believe we need to carefully 
> prioritize the Beam feature set in order to focus on the things that 
> provide the most value to our users.
>
> Cheers,
> Max
>
> On 07.11.19 15:55, Jan Lukavský wrote:
>> Hi,
>> is there anything I can do to make this more attractive? :-) Any 
>> feedback would be much appreciated.
>> Many thanks,
>>   Jan
>>
>> Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:
>>
>>     Hi,
>>
>>     I'd like to open a vote on accepting design document [1] as a 
>> base for
>>     implementation of @RequiresTimeSortedInput annotation for stateful
>>     DoFns. Associated JIRA [2] and PR [3] contains only subset of the 
>> whole
>>     functionality (allowed lateness ignored and no possibility to 
>> specify
>>     UDF for time - or sequential number - to be extracted from data).
>>     The PR
>>     will be subject to independent review process (please feel free to
>>     self-request review if you are interested in this) after the vote 
>> would
>>     eventually succeed. Missing features from the design document 
>> will be
>>     added later in subsequent JIRA issues, so that it doesn't block
>>     availability of this feature.
>>
>>     Please vote on adding support for @RequiresTimeSortedInput.
>>
>>     The vote is open for the next 72 hours and passes if at least 
>> three +1
>>     and no -1 PMC (binding) votes are cast.
>>
>>     [ ] +1 Add support for @RequiresTimeSortedInput
>>
>>     [ ] 0 I don't have a strong opinion about this, but I assume it's ok
>>
>>     [ ] -1 Do not support @RequiresTimeSortedInput - please provide
>>     explanation.
>>
>>     Thanks,
>>
>>       Jan
>>
>>     [1]
>> https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing
>>
>>
>>     [2] https://issues.apache.org/jira/browse/BEAM-8550
>>
>>     [3] https://github.com/apache/beam/pull/8774
>>
>>

Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Maximilian Michels <mx...@apache.org>.
Hi Jan,

Disclaimer: I haven't followed the discussion closely, so I do not want 
to comment on the technical details of the feature here.

 From the outside, it looks like there may be open questions. Also, we 
may need more motivation for what we can build with this feature or how 
it will become useful to users.

There are many threads in Beam and I believe we need to carefully 
prioritize the Beam feature set in order to focus on the things that 
provide the most value to our users.

Cheers,
Max

On 07.11.19 15:55, Jan Lukavský wrote:
> Hi,
> is there anything I can do to make this more attractive? :-) Any 
> feedback would be much appreciated.
> Many thanks,
>   Jan
> 
> Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:
> 
>     Hi,
> 
>     I'd like to open a vote on accepting design document [1] as a base for
>     implementation of @RequiresTimeSortedInput annotation for stateful
>     DoFns. Associated JIRA [2] and PR [3] contains only subset of the whole
>     functionality (allowed lateness ignored and no possibility to specify
>     UDF for time - or sequential number - to be extracted from data).
>     The PR
>     will be subject to independent review process (please feel free to
>     self-request review if you are interested in this) after the vote would
>     eventually succeed. Missing features from the design document will be
>     added later in subsequent JIRA issues, so that it doesn't block
>     availability of this feature.
> 
>     Please vote on adding support for @RequiresTimeSortedInput.
> 
>     The vote is open for the next 72 hours and passes if at least three +1
>     and no -1 PMC (binding) votes are cast.
> 
>     [ ] +1 Add support for @RequiresTimeSortedInput
> 
>     [ ] 0 I don't have a strong opinion about this, but I assume it's ok
> 
>     [ ] -1 Do not support @RequiresTimeSortedInput - please provide
>     explanation.
> 
>     Thanks,
> 
>       Jan
> 
>     [1]
>     https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing
> 
> 
>     [2] https://issues.apache.org/jira/browse/BEAM-8550
> 
>     [3] https://github.com/apache/beam/pull/8774
> 
> 

Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

Posted by Jan Lukavský <je...@seznam.cz>.
Hi,

is there anything I can do to make this more attractive? :-) Any feedback
would be much appreciated.

Many thanks,

 Jan

  

Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský <je...@seznam.cz>:  

> Hi,  
>  
>  I'd like to open a vote on accepting design document [1] as a base for  
>  implementation of @RequiresTimeSortedInput annotation for stateful  
>  DoFns. Associated JIRA [2] and PR [3] contains only subset of the whole  
>  functionality (allowed lateness ignored and no possibility to specify  
>  UDF for time - or sequential number - to be extracted from data). The PR  
>  will be subject to independent review process (please feel free to  
>  self-request review if you are interested in this) after the vote would  
>  eventually succeed. Missing features from the design document will be  
>  added later in subsequent JIRA issues, so that it doesn't block  
>  availability of this feature.  
>  
>  Please vote on adding support for @RequiresTimeSortedInput.  
>  
>  The vote is open for the next 72 hours and passes if at least three +1  
>  and no -1 PMC (binding) votes are cast.  
>  
>  [ ] +1 Add support for @RequiresTimeSortedInput  
>  
>  [ ] 0 I don't have a strong opinion about this, but I assume it's ok  
>  
>  [ ] -1 Do not support @RequiresTimeSortedInput - please provide
explanation.  
>  
>  Thanks,  
>  
>  Jan  
>  
>  [1]  
>
https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing  
>  
>  [2] https://issues.apache.org/jira/browse/BEAM-8550  
>  
>  [3] https://github.com/apache/beam/pull/8774  
>  
>